Session Details: Google Cloud Next 2026

phase 7 playlists session info modal

Day 1 – April 22, 2026

Breakouts

BRK3-032 • Kubernetes, Open Models • Advanced Technical

Large-scale LLM inference on GKE

Mandalay Bay H

1:30 PM - 2:15 PM

Stop overspending on AI infrastructure. High costs and GPU scarcity shouldn’t stall your innovation. In this session, learn how to turn Google Kubernetes Engine (GKE) into a powerhouse for LLM inference. We’ll move beyond basic deployments to master specific optimizations for model loading, accelerator utilization, and smart load balancing. By the end, you’ll know how to architect GKE environments that balance speed and availability while slashing operational costs. Don’t just run models – run them efficiently, reliably, and at scale. Build faster, spend smarter, and maximize every T4, L4, and H100 in your fleet.

Session Details

Large-scale LLM inference on GKE

Related Sessions