News
We release SWT-Bench
We evaluated OpenHands on SWT-Bench, achieving 22.8% success rate on Lite with vanilla setup. We discover that setting up the CI environment for the agent significantly improves the results to 28.3% on Lite and 27.7% on Verified.
We evaluated AEGIS on SWT-Bench Lite, achieving 47.8% success rate and 26.0% coverage increase. AEGIS is the first submitted agent specifically tailored for the task of software testing and achieves state-of-the-art results.