Breakouts
BRK3-032 • Kubernetes, Open Models • Advanced Technical
Large-scale LLM inference on GKE
location_on
Mandalay Bay H
schedule
1:30 PM - 2:15 PM
Stop overspending on AI infrastructure. High costs and GPU scarcity shouldn’t stall your innovation. In this session, learn how to turn Google Kubernetes Engine (GKE) into a powerhouse for LLM inference. We’ll move beyond basic deployments to master specific optimizations for model loading, accelerator utilization, and smart load balancing. By the end, you’ll know how to architect GKE environments that balance speed and availability while slashing operational costs. Don’t just run models – run them efficiently, reliably, and at scale. Build faster, spend smarter, and maximize every T4, L4, and H100 in your fleet.
Read
more