How our lifecycle analysis of 278,435 vulnerabilities led to ML models that predict exploitation at disclosure
AlBedah - ~246K CVEs analyzed through March 2024
AlBedah, Gashi, Howe - 278K CVEs (1999-2025)
Current Work - 327K CVEs
Each feature in our ML model is grounded in empirical findings from our lifecycle analysis and academic literature. Click any card for detailed justification.
vendor_exploit_rate
Historical exploitation success by vendor
Academic Basis: Shahzad et al. (2012, 2020) vendor patterns
Click for details
version_risk_delta
Risk inheritance across versions
Academic Basis: Shahzad et al. (2020) + Garcia et al. (2011) supplementary
Click for details
desc_similarity_max
17.9% importance - most predictive
Academic Basis: Suciu & Dumitras (EE) + VLAI (Bonhomme & Dulaunoy 2025)
Click for details
cwe_ids
CWE-stratified exploitation rates
Academic Basis: MITRE CWE framework
Click for details
exploit_patch_gap
Time window of vulnerability
Academic Basis: Frei (2009) "gap of insecurity"
Early Premium Model
25 day-0 features, AUC 0.9913
Source: NVD history database (4M+ records)
83.0% exploited before CVSS, 85.5% before CPE (6 independent sources)
How to predict WITHOUT waiting for NVD?
Day-0 features: vendor history + description similarity
Result: Early Premium (AUC 0.9913) outperforms full model!
Click any figure to view larger with resize options
84.5% agreement validates text-based prediction
Click to enlarge
Severity patterns in exploited CVEs
Click to enlarge
7 sources (6 independent + NVD exploit tags) consolidated for ground truth
Click to enlarge
CVE dates vs first exploit timing
Click to enlarge
Coverage comparison across sources
Click to enlarge
Early Premium model top features
Click to enlarge
AUC 0.9913 performance curve
Click to enlarge
Exploitation by vulnerability type
Click to enlarge
Risk inheritance visualization
Click to enlarge
Framework for CVE-CWE-CPE-Exploit-Patch analysis (~246K CVEs)
278K CVEs: 75.8% attacker wins, +147.9 day gap acceleration
Multi-dimensional vulnerability characterization (56K CVEs)
Dynamic exploitation risk from text features
RoBERTa severity prediction: 84.5% CVSS agreement
OS diversity for intrusion tolerance
Experience the ML models built from these research findings