Lightning Talks
DEVLT-301 • Architecture, Compute, Storage • Advanced Technical
LLM Inference on GKE for the rest of us
location_on
Developer Theater
schedule
3:00 PM - 3:25 PM
Learn to deploy LLMs efficiently without a hyperscale budget. This session explores practical strategies to optimize LLM inference on Kubernetes, balancing performance, scalability, and cost. We’ll dive into container and model optimization, accelerator management, storage, load balancing, and observability. Walk away with actionable tools to maximize the cost-to-performance ratio for your AI workloads.
Read more