Session Details: Google Cloud Next 2026

phase 7 playlists session info modal

Day 2 – April 23, 2026

Discussion Groups

BRK1-069-DG • Networking • Advanced Technical

AI inference: Performance when you need it, economy when you don't

Reef B

5:15 PM - 5:45 PM

As Generative AI moves from pilot to production, the 'one-size-fits-all' approach to serving is no longer viable. Platform teams face the dual challenge of meeting stringent SLOs for real-time applications while minimizing TCO for high-volume, cost-sensitive tasks. In this session, we unveil GKE Inference Gateway innovations designed to solve this equation. Explore how GKE delivers multi-accelerator flexibility and disaggregated serving for optimal price-performance. We’ll demonstrate how to fortify models using Model Armor, agent identity, and token-based quota protections. Finally, discover how to optimize inference with accelerators spread across multiple cloud regions

This agenda widget contains logic for handling same sessions available at multiple times

Session Details

AI inference: Performance when you need it, economy when you don't

Related Sessions