Back to leaderboard

pydantic-ai

PythonAgent + @agent.tool_plain
Model: gemini-2.5-flashGenerated: May 11, 2026, 12:00 AM UTC
NDCG@3
0.857
29 scored · 29/30 valid
Hit@1
100.0%
JustifQ 3.41/5
Latency p50
16.2s
p95 31.5s
Mean tokens
6,629
6,149 in · 480 out
Cost / run
$0.0181
8.4 avg tool calls

All metrics

count_total
30
count_valid
29
success_rate
96.7%
latency_p50
16.196s
latency_p95
31.534s
latency_mean
38.104s
latency_max
638.516s
mean_input_tokens
6,149
mean_output_tokens
480
mean_tool_calls
8.41
estimated_cost_usd_per_run
$0.018062
mean_ndcg_at_3
0.857
hit_at_1_rate
100.0%
mean_precision_at_3
0.494
mean_recall_at_3
0.575
n_scored
29
mean_justification_quality
3.41/5
mean_judge_score
15.31/20
judge_n
29
hit_step_limit_rate
0.0%