Discussion Groups
BRK1-069-DG • Networking • Advanced Technical
AI inference: Performance when you need it, economy when you don't
location_on
Reef B
schedule
5:15 PM - 5:45 PM
As Generative AI moves from pilot to production, the 'one-size-fits-all' approach to serving is no longer viable. Platform teams face the dual challenge of meeting stringent SLOs for real-time applications while minimizing TCO for high-volume, cost-sensitive tasks. In this session, we unveil GKE Inference Gateway innovations designed to solve this equation. Explore how GKE delivers multi-accelerator flexibility and disaggregated serving for optimal price-performance. We’ll demonstrate how to fortify models using Model Armor, agent identity, and token-based quota protections. Finally, discover how to optimize inference with accelerators spread across multiple cloud regions
Read more