VulcanBench
Open-source benchmarking for LLMs on realistic, multi-file software engineering tasks.
About
An evaluation harness that measures LLM performance across reasoning effort, language, and codebase scale using a tool-calling agent loop. It features 52 gold-verified tasks, a Docker sandbox for isolated tool execution, and a local dashboard for analyzing traces and five-metric scoring.
Details
- Built with
- GLM-5.2NEWStrongPost states 'adds first-class support for GLM 5.2 through ZAI' and the repo README explicitly lists 'zai:glm-5.2' as a supported provider.
The creator specifically highlights the model support in the release announcement and the integration is documented in the source code.
- Creator
- Source date
- Published on X Jun 21, 2026
- Listed
- Added to Dropday 1h ago · model released Jun 16, 2026
- Evidence
- Strong
The project is a mature open-source repository with significant codebase structure, documentation, and multiple releases.
Timeline
Teaser
Video
Playable
Product
Loading…
Media & coverage
sourced from 1 post

