ObscureLabs - Security in Plain Sight

12,847

Total Evaluations

559

CVE Database

Models Tested

2.3min

Avg Evaluation Time

How to read scores for remediation

SecPatchBench aggregates exploit blocking, regression safety, and performance impact. A higher overall score means Waclaude (or any model) not only detects a vulnerability but ships a minimal, production-ready fix.

Exploit replay → vulnerability closed
Regression suite & new tests → application still works
Performance delta → no latency surprises

See Waclaude remediation Benchmark methodology

Model Leaderboard

Rankings updated every 24 hours

Rank

Model

Score

Submissions

Last Updated

GPT-4 Turbo

C: 96.1% • P: 92.3% • M: 94.2%

94.2%

1,247

Jan 15, 2024

Claude 3.5 Sonnet

C: 94.5% • P: 91.2% • M: 92.7%

92.8%

1,089

Jan 15, 2024

Gemini Pro 1.5

C: 91.2% • P: 87.6% • M: 89.4%

89.4%

892

Jan 15, 2024

GPT-4

C: 88.9% • P: 85.3% • M: 87.1%

87.1%

2,341

Jan 15, 2024

Claude 3 Opus

C: 86.2% • P: 83.2% • M: 84.7%

84.7%

756

Jan 15, 2024

Llama 3.1 405B

C: 83.1% • P: 79.5% • M: 81.3%

81.3%

623

Jan 15, 2024

Mistral Large

C: 80.4% • P: 77.4% • M: 78.9%

78.9%

445

Jan 15, 2024

CodeLlama 70B

C: 76.8% • P: 72.4% • M: 74.6%

74.6%

334

Jan 15, 2024

Vulnerability Categories

Test coverage across security domains

Web Security

156CVEs

Last updated: 2024-01-15

Memory Safety

89CVEs

Last updated: 2024-01-14

Injection Attacks

124CVEs

Last updated: 2024-01-13

Authentication

67CVEs

Last updated: 2024-01-12

Cryptographic

45CVEs

Last updated: 2024-01-11

Access Control

78CVEs

Last updated: 2024-01-10

Submit Your Model

Evaluate your AI model against our comprehensive CVE dataset

Submission Requirements

•Model accepts vulnerability description and code context
•Returns patch in standard diff format
•API endpoint with rate limiting compliance
•Evaluation completes within 5 minutes per CVE

Evaluation Process

Automated Testing

Your model is tested against our CVE dataset

Quality Assessment

Patches are evaluated for correctness and quality

Leaderboard Update

Results are published to the public leaderboard

Submit Your Model

Live Benchmark Results

How to read scores for remediation

Model Leaderboard

Vulnerability Categories

Web Security

Memory Safety

Injection Attacks

Authentication

Cryptographic

Access Control

Submit Your Model

Submission Requirements

Evaluation Process

Products

Resources

Company

Security