Discussion Groups
BRK1-068A-DG • Compute • Technical
Architecting reinforcement learning workflows for LLMs
location_on
Reef F
schedule
8:00 AM - 8:30 AM
Post-training is where real model value is created. Join this discussion to explore the next generation of scalable alignment using Tunix, a JAX-native toolkit. We'll debate the tradeoffs of advanced techniques like DPO, PPO, and RLOO, and dissect the architecture required to run them on Cloud TPUs. You'll be able to share your experiences taking workflows from local research to multi-node clusters. Whether you are a JAX expert or curious about the switch, come discuss how to democratize efficient LLM alignment.
Read more