Session Details: Google Cloud Next 2026

phase 7 playlists session info modal

Cancelled

CXL-DEVLT-219 • App Dev, Cost Optimization, Gemini Enterprise Agent Platform • Technical

GPU virtualization and orchestration for heterogeneous LLM workloads in Kubernetes

Modern LLM pipelines often suffer from GPU waste or contention in Kubernetes. This session explores advanced virtualization to unlock GPU potential. We cover NVIDIA Multi-Instance GPU (MIG), software sharing, and strategies using GPU Operator and custom schedulers. We demonstrate dynamic MIG provisioning, pod optimization, and QoS enforcement. Learn to design scalable, multi-tenant clusters that maximize throughput and cut costs for diverse LLM workloads in production.

Session Details

GPU virtualization and orchestration for heterogeneous LLM workloads in Kubernetes

Related Sessions