Topic

coding-agents

Coverage, reference pages, tools, and guides connected to this topic.

  1. SWE-bench Verified hits 78%, prompting calls for a harder coding eval

    Top coding agents now resolve more than three of every four tasks in SWE-bench Verified, reigniting debate over whether the benchmark still discriminates between systems.