baseline-typescript
TypeScriptHand-rolled tool loop
Model:
gemini-2.5-flash·Generated: May 11, 2026, 12:00 AM UTCNDCG@3
0.589
29 scored · 29/30 valid
Hit@1
69.0%
JustifQ 3.00/5
Latency p50
20.5s
p95 32.7s
Mean tokens
6,392
5,897 in · 495 out
Cost / run
$0.0177
9.2 avg tool calls
All metrics
count_total
30
count_valid
29
success_rate
96.7%
latency_p50
20.522s
latency_p95
32.668s
latency_mean
21.620s
latency_max
43.858s
mean_input_tokens
5,897
mean_output_tokens
495
mean_tool_calls
9.21
estimated_cost_usd_per_run
$0.017744
mean_ndcg_at_3
0.589
hit_at_1_rate
69.0%
mean_precision_at_3
0.356
mean_recall_at_3
0.517
n_scored
29
mean_justification_quality
3.00/5
mean_judge_score
13.41/20
judge_n
29
hit_step_limit_rate
0.0%