Session Details: Google Cloud Next 2026

phase 7 playlists session info modal

Day 3 – April 24, 2026

Breakouts

BRK2-113 • Kubernetes, Cloud Runtimes • Technical

The GKE inference playbook: Optimize cost and performance

Mandalay Bay F

12:30 PM - 1:15 PM

Balancing inference latency and cost is generative AI’s toughest challenge. This session delivers the playbook for a high-performance, cost-efficient inference stack on Google Kubernetes Engine (GKE). Explore innovations for scaling state-of-the-art models, crushing cold starts, and maximizing utilization with Ironwood Tensor Processing Units (TPUs) and vLLM. We’ll take a deep dive into GKE Inference Gateway, demonstrating routing based on the service level objective (SLO) across clusters to ensure your application stays responsive – and your budget stays intact. Read more

This agenda widget contains logic for handling same sessions available at multiple times

Session Details

The GKE inference playbook: Optimize cost and performance

Related Sessions