Breakouts
BRK2-120 • Kubernetes, Cloud Runtimes • Technical
How OpenAI builds Kubernetes GPU clusters
location_on
Mandalay Bay H
schedule
9:45 AM - 10:30 AM
AI model producers are pushing Kubernetes to unprecedented scales. Join us to learn how OpenAI uses Google Cloud’s accelerator infrastructure for complex, multi-node inference. We’ll dive into building and maintaining massive clusters using the latest NVIDIA GB200 and GB300 GPUs, and cover critical concepts like NVLink domains, RDMA over Converged Ethernet (RoCE) networking, and topology-aware scheduling. Get battle-tested tactics for handling node failures and maximizing uptime directly from the teams operating the world’s largest AI workloads.
Read
more