Session Details: Google Cloud Next 2026

phase 7 playlists session info modal

Day 2 – April 23, 2026

Lightning Talks

CUSTLT-108 • Kubernetes, Cloud Runtimes • Technical

Optimize AI for less : How Moloco optimized TPU on GKE for maximum performance gain

Customer Theater

12:15 PM - 12:35 PM

In lightning-fast AI markets, price-performance is the ultimate competitive advantage. This lightning talk reveals how Moloco—a leader in operational machine learning—transformed their GKE infrastructure to achieve peak efficiency for Deep Learning Recommendation Models (DLRM).

We’ll dive into the co-innovation between Moloco and Google to optimize Trillium (6th Gen TPU) specifically for embedding-heavy workloads. Discover how they leveraged advanced embedding lookups and Trillium’s specialized SparseCore to slash latency and costs. Attendees will walk away with a proven optimization playbook and a reference architecture to scale high-throughput, low-latency recommendation engines on GKE TP

Session Details

Optimize AI for less : How Moloco optimized TPU on GKE for maximum performance gain

Related Sessions