os1ai
  • os1
  • Maintainers
  • NOTICE
  • os1: Responsible AI FAQs
  • Devcontainer Configurations for os1
  • .github
    • ISSUE_TEMPLATE
    • PULL_REQUEST_TEMPLATE
  • notebook
    • Contributing
  • Website
  • OS1
    • agentchat
      • contrib
        • agent_eval
        • captainagent
          • tools
Powered by GitBook
On this page
  1. OS1
  2. agentchat
  3. contrib

agent_eval

PreviouscontribNextcaptainagent

Last updated 4 months ago

Agents for running the pipeline.

AgentEval is a process for evaluating a LLM-based system's performance on a given task.

When given a task to evaluate and a few example runs, the critic and subcritic agents create evaluation criteria for evaluating a system's solution. Once the criteria has been created, the quantifier agent can evaluate subsequent task solutions based on the generated criteria.

See our for usage examples and general explanations.

AgentEval
blog post