Example

All examples follow the same pattern: wrap your framework's output in a function that accepts an EvaluationCase and returns a str, dict, or EvaluationResult.


Python callable

from agentx.evaluations.models import EvaluationCase

def my_agent(case: EvaluationCase) -> str:
    return f"Answer to: {case.query}"

report = (
    client.evaluations
    .run(dataset_id="...", subject={"kind": "custom_agent", "displayName": "My Bot", "framework": "raw_python"})
    .execute(my_agent)
    .finalize()
    .analyze()
)

Full example: basic_callable_eval