Hybrid Machine Learning Models for
CVE Exploitation Prediction

Addressing Temporal Bias Through Knowledge Graph Integration and Adaptive Ensembling

Building on our lifecycle analysis of 278,435 vulnerabilities, we developed ML models that predict exploitation at disclosure—before NVD enrichment arrives. AUC 0.9913 on real-world data.

Try: CVE-2024-3400, CVE-2021-44228, CVE-2026-1479

The Problem We Solve

Security teams face an overwhelming prioritisation challenge with 5,000+ new CVEs disclosed monthly. Existing tools suffer from temporal bias - they can't help when you need them most.

The Temporal Bias Problem

Critical enrichment data arrives too late. CVSS scores, CWE classifications, and exploit references have significant delays [Sonatype 2025].

64%
CVEs lack severity scores at disclosure
83.0%
Exploited before CVSS assignment

Our analysis of 4 million NVD change history records confirms only 23.6% have CVSS scores at publication. Attackers win 75.8% of direct races against vendors [AlBedah et al. 2026].

Our Solution

Predict exploitation before enrichment arrives. We combine three complementary approaches in an adaptive ensemble that adjusts based on data availability.

Description Similarity
Sentence-BERT embeddings match new CVEs to exploited patterns
Vendor/Product History
Historical exploitation rates transfer risk to new CVEs
Knowledge Graph
CVE-CWE-CAPEC-ATT&CK chains provide threat context
Early Premium Model: AUC 0.9913 - works at disclosure time with 25 features

Research Questions Addressed

RQ1

Can we achieve high-accuracy prediction for new CVEs before NVD enrichment completes?

Yes - AUC 0.9913 at disclosure time

RQ2

How should ensemble weights adapt based on data availability?

Empirically learned regime-specific weights

RQ3

What is the quantitative contribution of knowledge graph reasoning?

16.9% weight in MODERATE regime

How It Works

Click on any step to learn more about the technical details and implementation.

1. CVE Input Vulnerability Data
2. Data Regime SPARSE/MODERATE/RICH
3. Model Selection Adaptive Ensemble
4. Risk Score 0-100% Probability

Our Models

Best Overall

Early Premium

0.9913 AUC

25 features including vendor rates, version patterns, and description similarity.

Best for SPARSE regime (71.6%)
Top feature desc_similarity_max
Full Features

Full MLP

0.9719 AUC

66 features with EPSS, sightings, and ATT&CK-derived signals.

Best for RICH regime (2.5%)
Architecture 66-256-128-64-1
Interpretable

GNN GraphSAGE

0.9344 AUC

Learns from CVE-CWE-CAPEC-ATT&CK knowledge graph structure.

Graph size 652K edges
Nodes 306K CVEs + KG

Data Regime Distribution

71.6% of CVEs are SPARSE at publication - they lack EPSS and full NVD enrichment. Our Early Premium model handles this majority case with AUC 0.9913.

71.6%
SPARSE
No CWE, no CPE
25.2%
MODERATE
CWE or CPE present
3.2%
RICH
CVSS + CWE + CPE

Key Findings

83.0%

of exploited CVEs were exploited before receiving CVSS scores

17.9%

importance weight for description similarity - the top feature

35 days

median time from CVE reservation to publication

327K+

CVEs analyzed with 40K exploited instances

Top CWE Exploitation Rates

Certain weakness types are exploited more frequently than others. These are the top 5 most exploited CWE categories.

The Problem & Our Solution

Dataset at a Glance

327,971
Total CVEs
40,373
Exploited CVEs
7
Exploit Sources
12.39%
Exploitation Rate
2,658
Vendors Analysed
8,737
Products Tracked
372M+
Database Records

Building on Prior Research

This work extends our "Longitudinal Vulnerability Lifecycle Analysis" (278,435 CVEs, 1999-2025).

Lifecycle Finding

"Attackers win 75.8% of exploit-patch races"

ML Feature: vendor_exploit_rate

Click for details

Lifecycle Finding

"95% same-vendor version overlap"

ML Feature: version_risk_delta

Click for details

VLAI Precedent

84.5% CVSS agreement from descriptions

ML Feature: desc_similarity_max (17.9%)

Click for details

View full research journey

Exploitation Evidence Sources

We consolidate exploitation evidence from seven distinct sources - a 27x improvement over using CISA KEV alone (1,488 vs 40,373 CVEs).

ExploitDB 22,598 CVEs
NVD Exploit Tags 25,073 CVEs
CISA KEV 1,488 CVEs
VulnCheck KEV 4,482 CVEs
Threat Intelligence Sightings 3,021 CVEs
GitHub Advisories 2,841 CVEs
SSVC Active 1,890 CVEs

NVD Enrichment Timeline

Our analysis of 4,005,442 NVD change history records across 322,763 CVEs reveals critical enrichment delays:

Data Type At Pub 7 days 30 days 90 days
CVSS Score 23.6% 55.2% 68.8% 72.6%
CPE Config 2.6% 38.3% 55.2% 60.4%
CWE Class 22.5% 51.5% 64.1% 68.1%
References 0.1% 3.2% 5.4% 6.8%
83.0%
Exploited before CVSS assignment
85.5%
Exploited before CPE added

The Temporal Bias Problem

"By the time a CVE has enough data for traditional ML models to work effectively, the window for proactive defense has often passed."

A comprehensive survey of 82 vulnerability prioritisation studies identifies temporal bias as a persistent challenge - only 17 use predictive metrics, and only 22 combine exploitability with contextual factors [Jiang et al. 2025].

Key Race Statistics:

  • 75.8% - Attackers win direct races against vendors [AlBedah et al. 2026]
  • +147.9 days - Exploit-to-patch gap widening (2020-2024)
  • 35 days - Median CVE reservation to publication gap
  • 31.7% - Exploitation sightings occur BEFORE CVE publication

Key Contributions

Our research addresses temporal bias through seven innovations:

1. Multi-Source Exploitation Intelligence

Consolidated 7 sources to identify 40,373 unique exploited CVEs - 27x improvement over CISA KEV alone.

2. Temporal Lifecycle Features

DE-subset (37,686 CVEs), DP-subset (54,834 CVEs), DPE-subset (2,294 CVEs) - 84.7% improvement over prior work.

3. Vendor & Product Risk Inheritance

Historical rates for 2,658 vendors and 8,737 products enable immediate risk scoring for new CVEs.

4. Early Premium Model (AUC 0.9913)

25 disclosure-time features outperform 66-feature MLP (0.9719) using semantic similarity.

5. Knowledge Graph Integration

CVE→CWE→CAPEC→ATT&CK graph with 616 CWE-to-ATT&CK mappings for threat-aware prediction.

6. Principled Uncertainty Quantification

Monte Carlo Dropout for Bayesian estimation with Expected Calibration Error (ECE) of 0.045.

7. Empirically Learned Ensemble Weights

Logistic regression learns optimal regime-specific weights instead of arbitrary assignment.

Adaptive Ensemble Weights

Weights are learned via logistic regression on validation data, adapting to data availability:

Regime ML KG Similarity
SPARSE (16.3%) 33.3% 0% 66.7%
MODERATE (73.6%) 41.5% 16.9% 41.6%
RICH (10.0%) 39.9% 27.2% 33.0%

Note: Similarity leads SPARSE regime (no other signals), ML leads RICH regime (full data available).

Practical Impact

For Security Operations Centres (SOCs)

Immediate risk scoring for new CVEs - reducing prioritisation delay from weeks to hours.

For Vulnerability Management Teams

CWE-based exploitation rates guide weakness-class prioritisation (e.g., CWE-94 has 18.61% exploitation rate).

For Threat Intelligence Analysts

Knowledge graph reasoning chains provide interpretable explanations for human-in-the-loop validation.

Comparison with Existing Approaches

Approach Early KG Adaptive UQ
CVSS Delayed - - -
EPSS Partial - - -
GraphCVE Yes Yes - -
Khader GNN Yes Yes - -
Our Approach Yes Yes Yes Yes

KG = Knowledge Graph, UQ = Uncertainty Quantification