12,847
Total Evaluations
559
CVE Database
47
Models Tested
2.3min
Avg Evaluation Time
Model Leaderboard
Rankings updated every 24 hours
Rank
Model
Score
Submissions
Last Updated
#1
GPT-4 Turbo
C: 96.1% • P: 92.3% • M: 94.2%
94.2%
1,247
Jan 15, 2024
#2
Claude 3.5 Sonnet
C: 94.5% • P: 91.2% • M: 92.7%
92.8%
1,089
Jan 15, 2024
#3
Gemini Pro 1.5
C: 91.2% • P: 87.6% • M: 89.4%
89.4%
892
Jan 15, 2024
#4
GPT-4
C: 88.9% • P: 85.3% • M: 87.1%
87.1%
2,341
Jan 15, 2024
#5
Claude 3 Opus
C: 86.2% • P: 83.2% • M: 84.7%
84.7%
756
Jan 15, 2024
#6
Llama 3.1 405B
C: 83.1% • P: 79.5% • M: 81.3%
81.3%
623
Jan 15, 2024
#7
Mistral Large
C: 80.4% • P: 77.4% • M: 78.9%
78.9%
445
Jan 15, 2024
#8
CodeLlama 70B
C: 76.8% • P: 72.4% • M: 74.6%
74.6%
334
Jan 15, 2024
Vulnerability Categories
Test coverage across security domains
Web Security
156CVEs
Last updated: 2024-01-15
Memory Safety
89CVEs
Last updated: 2024-01-14
Injection Attacks
124CVEs
Last updated: 2024-01-13
Authentication
67CVEs
Last updated: 2024-01-12
Cryptographic
45CVEs
Last updated: 2024-01-11
Access Control
78CVEs
Last updated: 2024-01-10
Submit Your Model
Evaluate your AI model against our comprehensive CVE dataset
Submission Requirements
- •Model accepts vulnerability description and code context
- •Returns patch in standard diff format
- •API endpoint with rate limiting compliance
- •Evaluation completes within 5 minutes per CVE
Evaluation Process
1
Automated Testing
Your model is tested against our CVE dataset
2
Quality Assessment
Patches are evaluated for correctness and quality
3
Leaderboard Update
Results are published to the public leaderboard