Lightning Talks
DEVLT-224 • Technical
Building an Agentic Platform using Ray Serve LLM and vLLM on GKE
location_on
Developer Theater
schedule
4:00 PM - 4:25 PM
Discover how to deploy Qwen model on Google Kubernetes Engine (GKE) using Ray Serve and vLLM for high-throughput, low-latency inference. This session provides a guide to integrating an ADK agent for sophisticated chat and tool usage, leveraging TPU-enabled nodes for intensive workloads. Explore Ray native features for autoscaling and fault tolerance while gaining a blueprint to transform LLMs into dynamic "Agentic" systems - a key requirement for enterprises building next-generation AI applications.
Read more