agent_eval

Agents for running the AgentEvalarrow-up-right pipeline.

AgentEval is a process for evaluating a LLM-based system's performance on a given task.

When given a task to evaluate and a few example runs, the critic and subcritic agents create evaluation criteria for evaluating a system's solution. Once the criteria has been created, the quantifier agent can evaluate subsequent task solutions based on the generated criteria.

See our blog postarrow-up-right for usage examples and general explanations.

Last updated