Back to leaderboard

crewai

PythonAgent + Task + Crew
Model: gemini-2.5-flashGenerated: May 11, 2026, 12:00 AM UTC
NDCG@3
0.598
26 scored · 26/30 valid
Hit@1
69.2%
JustifQ 3.27/5
Latency p50
18.7s
p95 31.6s
Mean tokens
44,591
42,785 in · 1,806 out
Cost / run
$0.1072
11.8 avg tool calls

All metrics

count_total
30
count_valid
26
success_rate
86.7%
latency_p50
18.667s
latency_p95
31.617s
latency_mean
19.082s
latency_max
38.676s
mean_input_tokens
42,785
mean_output_tokens
1,806
mean_tool_calls
11.85
estimated_cost_usd_per_run
$0.107246
mean_ndcg_at_3
0.598
hit_at_1_rate
69.2%
mean_precision_at_3
0.385
mean_recall_at_3
0.462
n_scored
26
mean_justification_quality
3.27/5
mean_judge_score
13.81/20
judge_n
26
hit_step_limit_rate
0.0%