pydantic-ai
PythonAgent + @agent.tool_plain
Model:
gemini-2.5-flash·Generated: May 11, 2026, 12:00 AM UTCNDCG@3
0.857
29 scored · 29/30 valid
Hit@1
100.0%
JustifQ 3.41/5
Latency p50
16.2s
p95 31.5s
Mean tokens
6,629
6,149 in · 480 out
Cost / run
$0.0181
8.4 avg tool calls
All metrics
count_total
30
count_valid
29
success_rate
96.7%
latency_p50
16.196s
latency_p95
31.534s
latency_mean
38.104s
latency_max
638.516s
mean_input_tokens
6,149
mean_output_tokens
480
mean_tool_calls
8.41
estimated_cost_usd_per_run
$0.018062
mean_ndcg_at_3
0.857
hit_at_1_rate
100.0%
mean_precision_at_3
0.494
mean_recall_at_3
0.575
n_scored
29
mean_justification_quality
3.41/5
mean_judge_score
15.31/20
judge_n
29
hit_step_limit_rate
0.0%