Apps & Tools
DemoDuoBench
A benchmarking skill for coding agents to measure quality-per-dollar.
Built with
Kimi K2.7 CodeNEWModel credit: Strong — The project documentation and associated media provide explicit, direct credits to the model for use in the benchmarking scenarios.
The project README uses examples of Kimi models for benchmarking and the related YouTube video description explicitly links to the Moonshot AI Hugging Face page and credits Kimi K2.7.Build evidence
Strong — This is a functional open-source project hosted on GitHub with clear documentation, scripts, and instructions for local installation and usage.
Creator
alejandro-ao @alejandro-aoShipped
5h ago · model from Jun 12, 2026DuoBench is a self-contained skill for coding agents like Claude Code that orchestrates multi-agent benchmarks. It measures the performance and cost-efficiency of planner/implementer model pairings on real GitHub issues, providing aggregated results, seaborn plots, and isolated worktree commits for manual inspection.
#benchmarking#llm-eval#coding-agents#agentic-workflow
Timeline
Teaser
Demo
Playable
Product
Loading…
More with Kimi K2.7 Code
