Breakouts
BRK2-245 • Compute, Kubernetes, Storage, Open Models, Agents • Technical
Scaling inference for reasoning models & agents with NVIDIA on GKE
location_on
Surf C
schedule
5:00 PM - 5:45 PM
As AI models scale beyond single-GPU capacity, the gap between peak hardware performance and real-world serving widens. NVIDIA Dynamo is an open-source inference framework for large-scale serving, purpose-built to close that gap. By combining inference optimizations - like disaggregated inference and agentic-routing - with production-grade methods, Dynamo maximizes GPU utilization on Google Cloud’s A4X Max, powered by NVIDIA GB300 NVL72.
In this session, we'll discuss deploying Dynamo on Google Kubernetes Engine (GKE) and share real-world benchmarks.
This Session is hosted by a Google Cloud Next Sponsor. Visit your registration profile at g.co/cloudnext to opt out of sharing your contact information with the sponsor hosting this session.
Read more