SWE-bench Verified
Verified subset of SWE-bench, the canonical coding-agent benchmark.
SWE-bench Verified is the current canonical eval for coding agents. As of April 2026 it is approaching saturation; see the news on this.
Verified subset of SWE-bench, the canonical coding-agent benchmark.
SWE-bench Verified is the current canonical eval for coding agents. As of April 2026 it is approaching saturation; see the news on this.