Breakouts
BRK3-029 • Kubernetes, Cloud Runtimes • Advanced Technical
Platform engineering for AI: Architect a unified stack on GKE
location_on
Oceanside C
schedule
8:30 AM - 9:15 AM
Siloed AI model development wastes resources and slows innovation. The future of AI platform engineering is a unified substrate. Learn to architect a multi-tenant AI training platform on Google Kubernetes Engine (GKE) that’s optimized for large-scale workloads on specialized Tensor Processing Unit (TPU) and GPU hardware. In this session, we’ll demonstrate how to use MultiKueue and a wider Kubernetes ecosystem for sophisticated job queuing and quota sharing to efficiently manage and scale AI workloads across a fleet of clusters. Dive deep into dynamic resource allocation and topology aware scheduling to maximize performance. And leave with a reference architecture for a unified, multi-cluster AI operating system.
Read
more