Mockup for reviewTech-stack demonstration. Not affiliated with Nebius and not the live Builders Network.About this build →
← All apps
tokenfactoryFeatured

AgentScope Evals

Live evaluation harness for production agents.

About this project

Eval-as-you-go: run regression tests on your live agent traffic, replay failed conversations, A/B prompts. Powered by Nebius for the eval LLM.

Technologies

evals
agents
observability
624 stars on GitHub