SecPatchBench - ObscureLabs

Why SecPatchBench Matters

The industry needs standardized patch evaluation

The Evaluation Gap

Without standardized benchmarks, every team evaluates patches differently. This makes progress hard to measure and compare.

Exploit-Based Testing

We validate patches against actual exploits and proof-of-concepts, not just unit tests. If the exploit still works, the patch fails.

Reproducible Results

Standardized datasets and metrics enable fair comparison between different patch generation approaches and models.

How It Works

Rigorous evaluation methodology

CVE Selection

Load vulnerability dataset

Exploit Setup

Patch Input

Sandbox Execution

Validation Tests

Score Output

Validation Terminal

→ Loading CVE-2024-1234 from dataset
Type: SQL Injection | Language: Python
Severity: Critical (CVSS 9.8)

Open Source

Transparent by design

SecPatchBench is fully open source because security shouldn't be a black box. Audit our methods, contribute improvements, and build confidence in autonomous patching.

⭐487

GitHub Stars

👥23

Contributors

📦1.2K

Weekly Downloads

🛡️500+

CVEs Validated

View on GitHub Read Docs

# Quick Start

pip install secpatchbench

# Run validation on your patches
secpatchbench validate \
  --patch ./fix-cve-2024-1234.diff \
  --exploit ./poc.py \
  --sandbox docker

# Results
{
  "vulnerability_fixed": true,
  "exploit_prevented": true,
  "regression_tests": "passed",
  "performance_impact": "negligible",
  "confidence_score": 0.992
}

Trusted by security teams worldwide

Start benchmarking your AI patches

Join the growing community of researchers and companies using SecPatchBench to validate and improve their patch generation models.

View Live Results →View on GitHub

Open source

Production tested

Exploit validated

The industry standard for AI patch validation