VulcanBench

Open-source benchmarking for LLMs on realistic, multi-file software engineering tasks.

About

▶

An evaluation harness that measures LLM performance across reasoning effort, language, and codebase scale using a tool-calling agent loop. It features 52 gold-verified tasks, a Docker sandbox for isolated tool execution, and a local dashboard for analyzing traces and five-metric scoring.

#benchmarking #llm-eval #developer-tools #software-engineering

Details

Built with: GLM-5.2NEW
Strong
The creator specifically highlights the model support in the release announcement and the integration is documented in the source code.
Post states 'adds first-class support for GLM 5.2 through ZAI' and the repo README explicitly lists 'zai:glm-5.2' as a supported provider.
Creator: Morgan Linton @morganlinton
Source date: Published on X Jun 21, 2026
Listed: Added to Dropday 1h ago · model released Jun 16, 2026
Evidence: Strong
The project is a mature open-source repository with significant codebase structure, documentation, and multiple releases.

Timeline

Teaser

Video

Playable

Product

Loading…

Media & coverage

sourced from 1 post

X post by Morgan (@morganlinton)primaryMorgan

Similar

▶

PonytailApps & Tools

OpenCode GoApps & Tools

▶

OpenClaudeApps & Tools