News
The Amazon Q Developer Agent is evaluated on SWT-Bench and achieves SOTA unit test generation results on both SWT-Bench Lite (37.3%) and Verified (48.7%).
We release SWT-Bench
We evaluated OpenHands on SWT-Bench, achieving 22.8% success rate on Lite with vanilla setup. We discover that setting up the CI environment for the agent significantly improves the results to 28.3% on Lite and 27.7% on Verified.
We evaluated AEGIS on SWT-Bench Lite, achieving 47.8% success rate and 26.0% coverage increase. AEGIS is the first submitted agent specifically tailored for the task of software testing and achieves state-of-the-art results.