Hangarvibe
All products

hg bench

HangarBench

Measure your agents. Improve what you can prove.

hg benchdemo

HangarBench is an open benchmark suite for agentic coding tasks. Measure your specific agent setup — model, system prompt, tool configuration — against a standardized task library. Compare results across models, prompts, and configurations. Publish to the public leaderboard or keep results private. The only way to get better at running agents is to measure the right things.

What's included

Everything HangarBench does

  • Standardized task library covering real-world coding scenarios
  • Compare any two configurations side by side
  • Public leaderboard for community benchmarking
  • Private mode — run benchmarks without publishing results
  • Prompt sensitivity analysis — see how small changes affect output
  • Credit cost tracking — optimize for quality per credit spent