Session Details: Google Cloud Next 2026

phase 7 playlists session info modal

Day 2 – April 23, 2026

Discussion Groups

BRK2-171A-DG • Compute • Advanced Technical

TPU inference: Portability, cost, and scaling for state of the art

Reef D

11:45 AM - 12:15 PM

Demystifying TPU inference doesn't have to require writing custom compilers. Join this collaborative discussion to exchange strategies for deploying vLLM and other frameworks across Google Cloud hardware. We will dissect real-world implementations, debating the business tradeoffs between time-to-first-token latency and raw throughput, the reality of switching workloads between GPUs and TPUs, and overcoming the production friction of cold-starts and OOMs. Bring your deployment headaches and swap real-world battle stories with fellow ML engineers and platform architects.

This agenda widget contains logic for handling same sessions available at multiple times

Session Details

TPU inference: Portability, cost, and scaling for state of the art

Related Sessions