Discussion Groups
BRK2-171A-DG • Compute • Advanced Technical
TPU inference: Portability, cost, and scaling for state of the art
location_on
Reef D
schedule
11:45 AM - 12:15 PM
Demystifying TPU inference doesn't have to require writing custom compilers. Join this collaborative discussion to exchange strategies for deploying vLLM and other frameworks across Google Cloud hardware. We will dissect real-world implementations, debating the business tradeoffs between time-to-first-token latency and raw throughput, the reality of switching workloads between GPUs and TPUs, and overcoming the production friction of cold-starts and OOMs. Bring your deployment headaches and swap real-world battle stories with fellow ML engineers and platform architects.
Read more