hg bench

HangarBench

Measure your agents. Improve what you can prove.

hg benchdemo

HangarBench is an open benchmark suite for agentic coding tasks. Measure your specific agent setup — model, system prompt, tool configuration — against a standardized task library. Compare results across models, prompts, and configurations. Publish to the public leaderboard or keep results private. The only way to get better at running agents is to measure the right things.

What's included

Everything HangarBench does

Standardized task library covering real-world coding scenarios
Compare any two configurations side by side
Public leaderboard for community benchmarking
Private mode — run benchmarks without publishing results
Prompt sensitivity analysis — see how small changes affect output
Credit cost tracking — optimize for quality per credit spent

Get Hangarvibe at launch See pricing