Back to leaderboard

langgraph

PythonStateGraph + MessagesState
Model: gemini-2.5-flashGenerated: May 11, 2026, 12:00 AM UTC
NDCG@3
0.823
27 scored · 27/30 valid
Hit@1
92.6%
JustifQ 3.70/5
Latency p50
17.1s
p95 25.9s
Mean tokens
5,669
5,167 in · 502 out
Cost / run
$0.0164
8.8 avg tool calls

All metrics

count_total
30
count_valid
27
success_rate
90.0%
latency_p50
17.087s
latency_p95
25.861s
latency_mean
17.118s
latency_max
29.841s
mean_input_tokens
5,167
mean_output_tokens
502
mean_tool_calls
8.81
estimated_cost_usd_per_run
$0.016371
mean_ndcg_at_3
0.823
hit_at_1_rate
92.6%
mean_precision_at_3
0.531
mean_recall_at_3
0.568
n_scored
27
mean_justification_quality
3.70/5
mean_judge_score
15.59/20
judge_n
27
hit_step_limit_rate
0.0%