Custom Agent Evaluations

Evaluate any AI agent — LangChain, CrewAI, AutoGen, LlamaIndex, OpenAI, Anthropic, HTTP endpoints, or plain Python — using AgentX as a scoring and reporting backend. Your agent runs locally; AgentX scores results and generates a full analysis report.

How it works

Build a dataset — define cases (queries + acceptance/rejection criteria). Run your agent — the SDK calls your function or endpoint for each case. Finalize + analyze — AgentX scores every response and generates a report. View results — in the terminal and on the AgentX dashboard.

Your agent (local)  →  AgentX SDK  →  AgentX API (scores + analyzes)  →  Report