Why SecPatchBench Matters
The industry needs standardized patch evaluation
The Evaluation Gap
Without standardized benchmarks, every team evaluates patches differently. This makes progress hard to measure and compare.
Exploit-Based Testing
We validate patches against actual exploits and proof-of-concepts, not just unit tests. If the exploit still works, the patch fails.
Reproducible Results
Standardized datasets and metrics enable fair comparison between different patch generation approaches and models.
How It Works
Rigorous evaluation methodology
CVE Selection
Load vulnerability dataset
Exploit Setup
Patch Input
Sandbox Execution
Validation Tests
Score Output
→ Loading CVE-2024-1234 from datasetType: SQL Injection | Language: PythonSeverity: Critical (CVSS 9.8)
Open Source
Transparent by design
SecPatchBench is fully open source because security shouldn't be a black box. Audit our methods, contribute improvements, and build confidence in autonomous patching.
GitHub Stars
Contributors
Weekly Downloads
CVEs Validated
# Quick Start pip install secpatchbench # Run validation on your patches secpatchbench validate \ --patch ./fix-cve-2024-1234.diff \ --exploit ./poc.py \ --sandbox docker # Results { "vulnerability_fixed": true, "exploit_prevented": true, "regression_tests": "passed", "performance_impact": "negligible", "confidence_score": 0.992 }
Start benchmarking your AI patches
Join the growing community of researchers and companies using SecPatchBench to validate and improve their patch generation models.