Custom Agent Evaluations
Evaluate **any AI agent **— LangChain, CrewAI, AutoGen, LlamaIndex, OpenAI, Anthropic, HTTP endpoints, or plain Python — using AgentX as a scoring and reporting backend. Your agent runs locally; AgentX scores results and generates a full analysis report.
How it works
Build a dataset — define cases (queries + acceptance/rejection criteria). Run your agent — the SDK calls your function or endpoint for each case. Finalize + analyze — AgentX scores every response and generates a report. View results — in the terminal and on the AgentX dashboard.
Your agent (local) → AgentX SDK → AgentX API (scores + analyzes) → Report
Updated about 8 hours ago
