Browse

VulcanBench

Open-source benchmarking for LLMs on realistic, multi-file software engineering tasks.

About

An evaluation harness that measures LLM performance across reasoning effort, language, and codebase scale using a tool-calling agent loop. It features 52 gold-verified tasks, a Docker sandbox for isolated tool execution, and a local dashboard for analyzing traces and five-metric scoring.

Details
Built with
GLM-5.2NEW
Strong

The creator specifically highlights the model support in the release announcement and the integration is documented in the source code.

Post states 'adds first-class support for GLM 5.2 through ZAI' and the repo README explicitly lists 'zai:glm-5.2' as a supported provider.
Source date
Published on X Jun 21, 2026
Listed
Added to Dropday 1h ago · model released Jun 16, 2026
Evidence
Strong

The project is a mature open-source repository with significant codebase structure, documentation, and multiple releases.

Timeline
Teaser
Video
Playable
Product

Loading…

Media & coverage
sourced from 1 post