Breakouts
BRK2-116 • Kubernetes, Cloud Runtimes • Technical
Beyond fine-tuning: The next frontier of training and RL on GKE
location_on
Surf C
schedule
11:00 AM - 11:45 AM
Large-scale reinforcement learning (RL) creates infrastructure bottlenecks in the sampling and training cycle. Join this session to explore Mistral AI’s RL strategies and Anyscale’s high-performance Ray on Google Kubernetes Engine (GKE). We’ll analyze GKE primitives for faster RL loop times, focusing on sampling, weight transfer, and sandboxing for isolation. Leave with recommendations for cluster resilience against hardware failures and preemptions, validated through RL on smaller models and large-scale mixture-of-experts (MoE) architectures.
Read more