Discussion Groups
BRK2-171B-DG • Compute, Open Models • Technical
Diagnose and optimize: A guide to ML workload observability on TPUs
location_on
Reef A
schedule
5:15 PM - 5:45 PM
Diagnosing and optimizing ML workloads on TPUs is complicated and requires visibility from your ML workload all the way to the TPUs. In this discussion group, we'll introduce new capabilities to diagnose workloads on TPUs, including ML Diagnostics platform, Managed Xprof profiling, and Workload Monitoring. These capabilities from internal Google platforms such as Gemini, Search, and Youtube are useful for their workload diagnosis on TPUs. Join this deep dive to learn how these tools can make your workloads run best on TPUs.
Read more