Back to leaderboard

vercel-ai-sdk

TypeScriptToolLoopAgent
Model: gemini-2.5-flashGenerated: May 11, 2026, 12:00 AM UTC
NDCG@3
0.662
27 scored · 27/30 valid
Hit@1
77.8%
JustifQ 2.89/5
Latency p50
21.2s
p95 28.4s
Mean tokens
1,833
1,605 in · 228 out
Cost / run
$0.0060
9.1 avg tool calls

All metrics

count_total
30
count_valid
27
success_rate
90.0%
latency_p50
21.227s
latency_p95
28.433s
latency_mean
20.253s
latency_max
45.544s
mean_input_tokens
1,605
mean_output_tokens
228
mean_tool_calls
9.07
estimated_cost_usd_per_run
$0.005951
mean_ndcg_at_3
0.662
hit_at_1_rate
77.8%
mean_precision_at_3
0.370
mean_recall_at_3
0.519
n_scored
27
mean_justification_quality
2.89/5
mean_judge_score
13.07/20
judge_n
27
hit_step_limit_rate
0.0%