Lightning Talks
CUSTLT-108 • Kubernetes, Cloud Runtimes • Technical
Optimize AI for less : How Moloco optimized TPU on GKE for maximum performance gain
location_on
Customer Theater
schedule
12:15 PM - 12:35 PM
In lightning-fast AI markets, price-performance is the ultimate competitive advantage. This lightning talk reveals how Moloco—a leader in operational machine learning—transformed their GKE infrastructure to achieve peak efficiency for Deep Learning Recommendation Models (DLRM).
We’ll dive into the co-innovation between Moloco and Google to optimize Trillium (6th Gen TPU) specifically for embedding-heavy workloads. Discover how they leveraged advanced embedding lookups and Trillium’s specialized SparseCore to slash latency and costs. Attendees will walk away with a proven optimization playbook and a reference architecture to scale high-throughput, low-latency recommendation engines on GKE TP
Read more