Lightning Talks
DEVLT-223 • Gemini, Serverless • Technical
Scaling Gemini Inference
location_on
Developer Theater
schedule
3:30 PM - 3:55 PM
Want to process 100k+ Gemini requests now? The Gemini Batch API is designed for this but can take upto 24 hours. What if you need to respond faster? Join this tactical session on building a high-throughput inference engine with Cloud Run. We share real-world architecture on scaling instances, tuning CPU vs. Concurrency, and surviving "429 Rate Limit" storms. Learn to deal with the quotas and limits, optimize max_concurrency, and architect for massive parallelism using serverless containers.
Read more