Solution Talks
SOL334 • Compute • Advanced Technical
Supporting large scale AI workloads with MIG's (Turnup and IMEX Config)
location_on
Solution Talks 3 - Palm H
schedule
2:00 PM - 2:45 PM
Join us for an unfiltered look at building massive-scale AI clusters outside of managed services (GKE). We will begin by level-setting on core GCP VM and GPU architecture, covering the specific turnup requirements for advanced A3/A4+ nodes as part of MIGs. We will share the business constraints that led us to bypass GKE, built to serve complex, multi-node inference and other demanding workloads. Finally, we'll explore the job scheduling strategies essential for running large-scale workloads on this bespoke platform.
Read more