12,847

Total Evaluations

559

CVE Database

47

Models Tested

2.3min

Avg Evaluation Time

Model Leaderboard

Rankings updated every 24 hours

Rank
Model
Score
Submissions
Last Updated
#1
GPT-4 Turbo
C: 96.1% • P: 92.3% • M: 94.2%
94.2%
1,247
Jan 15, 2024
#2
Claude 3.5 Sonnet
C: 94.5% • P: 91.2% • M: 92.7%
92.8%
1,089
Jan 15, 2024
#3
Gemini Pro 1.5
C: 91.2% • P: 87.6% • M: 89.4%
89.4%
892
Jan 15, 2024
#4
GPT-4
C: 88.9% • P: 85.3% • M: 87.1%
87.1%
2,341
Jan 15, 2024
#5
Claude 3 Opus
C: 86.2% • P: 83.2% • M: 84.7%
84.7%
756
Jan 15, 2024
#6
Llama 3.1 405B
C: 83.1% • P: 79.5% • M: 81.3%
81.3%
623
Jan 15, 2024
#7
Mistral Large
C: 80.4% • P: 77.4% • M: 78.9%
78.9%
445
Jan 15, 2024
#8
CodeLlama 70B
C: 76.8% • P: 72.4% • M: 74.6%
74.6%
334
Jan 15, 2024

Vulnerability Categories

Test coverage across security domains

Web Security

156CVEs
Last updated: 2024-01-15

Memory Safety

89CVEs
Last updated: 2024-01-14

Injection Attacks

124CVEs
Last updated: 2024-01-13

Authentication

67CVEs
Last updated: 2024-01-12

Cryptographic

45CVEs
Last updated: 2024-01-11

Access Control

78CVEs
Last updated: 2024-01-10

Submit Your Model

Evaluate your AI model against our comprehensive CVE dataset

Submission Requirements

  • Model accepts vulnerability description and code context
  • Returns patch in standard diff format
  • API endpoint with rate limiting compliance
  • Evaluation completes within 5 minutes per CVE

Evaluation Process

1

Automated Testing

Your model is tested against our CVE dataset

2

Quality Assessment

Patches are evaluated for correctness and quality

3

Leaderboard Update

Results are published to the public leaderboard