Back to leaderboard

baseline-typescript

TypeScriptHand-rolled tool loop
Model: gemini-2.5-flashGenerated: May 11, 2026, 12:00 AM UTC
NDCG@3
0.589
29 scored · 29/30 valid
Hit@1
69.0%
JustifQ 3.00/5
Latency p50
20.5s
p95 32.7s
Mean tokens
6,392
5,897 in · 495 out
Cost / run
$0.0177
9.2 avg tool calls

All metrics

count_total
30
count_valid
29
success_rate
96.7%
latency_p50
20.522s
latency_p95
32.668s
latency_mean
21.620s
latency_max
43.858s
mean_input_tokens
5,897
mean_output_tokens
495
mean_tool_calls
9.21
estimated_cost_usd_per_run
$0.017744
mean_ndcg_at_3
0.589
hit_at_1_rate
69.0%
mean_precision_at_3
0.356
mean_recall_at_3
0.517
n_scored
29
mean_justification_quality
3.00/5
mean_judge_score
13.41/20
judge_n
29
hit_step_limit_rate
0.0%