Session Details: Google Cloud Next 2026

phase 7 playlists session info modal

Day 1 – April 22, 2026

Lightning Talks

DEVLT-301 • Architecture, Compute, Storage • Advanced Technical

LLM Inference on GKE for the rest of us

Developer Theater

3:00 PM - 3:25 PM

Learn to deploy LLMs efficiently without a hyperscale budget. This session explores practical strategies to optimize LLM inference on Kubernetes, balancing performance, scalability, and cost. We’ll dive into container and model optimization, accelerator management, storage, load balancing, and observability. Walk away with actionable tools to maximize the cost-to-performance ratio for your AI workloads.

Session Details

LLM Inference on GKE for the rest of us

Related Sessions