Session Details: Google Cloud Next 2026

phase 7 playlists session info modal

Day 1 – April 22, 2026

Breakouts

BRK2-245 • Compute, Kubernetes, Storage, Open Models, Agents • Technical

Scaling inference for reasoning models & agents with NVIDIA on GKE

Surf C

5:00 PM - 5:45 PM

As AI models scale beyond single-GPU capacity, the gap between peak hardware performance and real-world serving widens. NVIDIA Dynamo is an open-source inference framework for large-scale serving, purpose-built to close that gap. By combining inference optimizations - like disaggregated inference and agentic-routing - with production-grade methods, Dynamo maximizes GPU utilization on Google Cloud’s A4X Max, powered by NVIDIA GB300 NVL72.

In this session, we'll discuss deploying Dynamo on Google Kubernetes Engine (GKE) and share real-world benchmarks.

This Session is hosted by a Google Cloud Next Sponsor. Visit your registration profile at g.co/cloudnext to opt out of sharing your contact information with the sponsor hosting this session.

This agenda widget contains logic for handling same sessions available at multiple times

Session Details

Scaling inference for reasoning models & agents with NVIDIA on GKE

Related Sessions