Research Journey

How our lifecycle analysis of 278,435 vulnerabilities led to ML models that predict exploitation at disclosure

Research Timeline

2024

Transfer Report

AlBedah - ~246K CVEs analyzed through March 2024

  • Established CVE-CWE-CPE-Exploit-Patch framework
  • Completed sub-objectives 1-6 (lifecycle foundations)
  • Identified ML prediction as Future Work
2025

"Longitudinal Vulnerability Lifecycle Analysis"

AlBedah, Gashi, Howe - 278K CVEs (1999-2025)

  • Attackers win 75.8% of exploit-patch races
  • Gap widened from -58 to +147.9 days (2010-2024)
  • 95% same-vendor version overlap
2026

ML Exploitation Prediction

Current Work - 327K CVEs

  • 83.0% exploited before CVSS, 85.5% before CPE (6 independent sources)
  • Early Premium: AUC 0.9913 with 25 features
  • Outperforms full 66-feature model

Lifecycle Findings to ML Features

Each feature in our ML model is grounded in empirical findings from our lifecycle analysis and academic literature. Click any card for detailed justification.

Lifecycle Finding

"Attackers win 75.8% of races"

ML Feature

vendor_exploit_rate

Historical exploitation success by vendor

Academic Basis: Shahzad et al. (2012, 2020) vendor patterns

Click for details

Lifecycle Finding

"95% same-vendor version overlap"

ML Feature

version_risk_delta

Risk inheritance across versions

Academic Basis: Shahzad et al. (2020) + Garcia et al. (2011) supplementary

Click for details

Academic Precedent

VLAI: 84.5% CVSS Agreement

ML Feature

desc_similarity_max

17.9% importance - most predictive

Academic Basis: Suciu & Dumitras (EE) + VLAI (Bonhomme & Dulaunoy 2025)

Click for details

Lifecycle Finding

"SQL Injection: 91% to 40%"

ML Feature

cwe_ids

CWE-stratified exploitation rates

Academic Basis: MITRE CWE framework

Click for details

Lifecycle Finding

"Gap widened to +147.9 days"

ML Feature

exploit_patch_gap

Time window of vulnerability

Academic Basis: Frei (2009) "gap of insecurity"

ML Paper Finding

"83.0% before CVSS, 85.5% before CPE"

Model Design

Early Premium Model

25 day-0 features, AUC 0.9913

Source: NVD history database (4M+ records)

The "Aha!" Moment

The Problem

83.0% exploited before CVSS, 85.5% before CPE (6 independent sources)

The Question

How to predict WITHOUT waiting for NVD?

The Answer

Day-0 features: vendor history + description similarity

Result: Early Premium (AUC 0.9913) outperforms full model!

Key Visualizations

Click any figure to view larger with resize options

VLAI Confusion Matrix

VLAI-CVSS Agreement

84.5% agreement validates text-based prediction

Click to enlarge

VLAI Exploited Severity

VLAI Severity Distribution

Severity patterns in exploited CVEs

Click to enlarge

Exploitation Sources

Exploitation Sources

7 sources (6 independent + NVD exploit tags) consolidated for ground truth

Click to enlarge

Lifecycle vs Exploit

Lifecycle Analysis

CVE dates vs first exploit timing

Click to enlarge

Coverage Comparison

Data Coverage

Coverage comparison across sources

Click to enlarge

PDF Preview

Feature Importance

Early Premium model top features

Click to enlarge

PDF Preview

Early Premium ROC

AUC 0.9913 performance curve

Click to enlarge

PDF Preview

CWE Exploitation Rates

Exploitation by vulnerability type

Click to enlarge

PDF Preview

Vendor/Product Features

Risk inheritance visualization

Click to enlarge

Academic Foundation

Our Prior Work

  • AlBedah (2024) - Transfer Report

    Framework for CVE-CWE-CPE-Exploit-Patch analysis (~246K CVEs)

  • AlBedah, Gashi, Howe (2025) - "Longitudinal Vulnerability Lifecycle Analysis"

    278K CVEs: 75.8% attacker wins, +147.9 day gap acceleration

Key Academic Precedents

  • Shahzad et al. (2012, 2020)

    Multi-dimensional vulnerability characterization (56K CVEs)

  • Suciu & Dumitras - Expected Exploitability

    Dynamic exploitation risk from text features

  • VLAI (Bonhomme & Dulaunoy 2025)

    RoBERTa severity prediction: 84.5% CVSS agreement

  • Garcia et al. (2011) (supplementary)

    OS diversity for intrusion tolerance

Try the CVE Predictor

Experience the ML models built from these research findings

×

Download