Lightning Talks
EXPOLT200 • Applied AI, Agents • Technical
Demystifying evals at the frontier of agentic development
location_on
Expo Theater 2
schedule
1:00 PM - 1:20 PM
Teams without evals get stuck in reactive loops—catching issues in production, unable to distinguish regressions from noise. Teams that invest early find development accelerates as failures become test cases and metrics replace guesswork. This talk shares what Anthropic has learned building evals for Claude Code and deploying them with customers across coding, conversational, and research agents. You'll leave with a roadmap: structuring agent evals, choosing the right graders, balancing capability vs. regression testing, and building a suite you trust.
By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only.
Read more