Engineer the agent-quality flywheel: Using Gemini Enterprise Agent Platform evaluations to optimize agents
Treating agent quality as a rigorous engineering discipline is the only way to scale. Stop guessing and start measuring. Join this session for a deep dive into the state-of-the-art techniques Google uses to build our own agents, and learn how to make them a part of your process. We’ll demonstrate how you can adopt the “Quality Flywheel” methodology, which includes bootstrapping effective offline evaluation with synthetic test generation, using LLM-as-a-judge autoraters and trajectory evaluations, performing user and environment simulation, identifying systemic failures in production with multi-turn autoraters and loss-pattern clustering, aligning test coverage with actual usage, and using automated optimization capabilities to scientifically refine performance. Ramp up with confidence.
Read more