DuoBench

Apps & Tools

Demo

A benchmarking skill for coding agents to measure quality-per-dollar.

Built with

Kimi K2.7 CodeNEW

Model credit: Strong — The project documentation and associated media provide explicit, direct credits to the model for use in the benchmarking scenarios.

The project README uses examples of Kimi models for benchmarking and the related YouTube video description explicitly links to the Moonshot AI Hugging Face page and credits Kimi K2.7.

Build evidence

Strong — This is a functional open-source project hosted on GitHub with clear documentation, scripts, and instructions for local installation and usage.

Creator

alejandro-ao @alejandro-ao

Shipped

5h ago · model from Jun 12, 2026

DuoBench is a self-contained skill for coding agents like Claude Code that orchestrates multi-agent benchmarks. It measures the performance and cost-efficiency of planner/implementer model pairings on real GitHub issues, providing aggregated results, seaborn plots, and isolated worktree commits for manual inspection.

#benchmarking#llm-eval#coding-agents#agentic-workflow

Timeline

Teaser

Demo

Playable

Product

Loading…

More with Kimi K2.7 Code

Kimi