Detailed comparison of all ML models from the CVE exploitation prediction paper
Best Overall AUC
Deep Neural Network
Knowledge Graph Reasoning
71.6% of CVEs fall into SPARSE regime at publication time, requiring the Early Premium model which works without EPSS, sightings, or CVSS data.
P_final = w_ml × P_ml + w_kg × P_kg + w_sim × P_sim
Best performance with minimal data - AUC 0.9913
Named because it works at the earliest stage of CVE lifecycle (before NVD enrichment) while achieving premium (best) performance. It outperforms even the full 66-feature MLP because it focuses on the most predictive signals.
Weighted Binary Cross-Entropy with class weights to handle imbalanced data (only ~5% of CVEs are exploited).
Early Premium uses a hierarchical knowledge graph capturing vendor, product, and version relationships. This enables risk inheritance: a vendor with historically exploited products signals higher risk for new CVEs.
vendor_exploit_ratevendor_cve_countvendor_avg_cvssvendor_risk_scoreproduct_exploit_rateproduct_cve_countproduct_avg_severityproduct_cwe_diversityversion_exploit_rateversion_exploit_rate_beforeversion_exploit_rate_afterversion_risk_deltaThe model captures how vulnerability patterns propagate across versions:
version_risk_delta = version_exploit_rate_before - version_exploit_rate_after (positive = improving)
Deep learning with all 66 features - AUC 0.9719
Used for RICH regime CVEs (2.5% of total) that have full NVD enrichment: EPSS scores, CVSS data, sightings, and complete ATT&CK mappings.
More features doesn't always mean better. The Early Premium model achieves higher AUC because it focuses on the most predictive signals and avoids noise from less useful features.
Knowledge graph reasoning - AUC 0.9344
Uses GraphSAGE (Graph Sample and Aggregate) to learn from the CVE knowledge graph. Each CVE node aggregates information from its neighboring CWE, CAPEC, and ATT&CK technique nodes to create a rich representation.
All count features (sightings, days, techniques) are log-transformed using log1p(x).
Binary flags are set based on presence in high-risk sets. Severity score is normalized to 0-1 using weighted tactic contributions.
Can explain predictions through the graph path: "This CVE is risky because CWE-89 links to T1190 (Exploit Public-Facing Application) used by APT28."
Benefits from the entire graph structure. New CVEs inherit knowledge from similar CWEs that were exploited before.
Connects CVEs to real threat groups and malware. If APT29 uses techniques linked to a CVE's CWE, risk increases.
The final prediction combines three components with regime-specific weights learned through logistic regression.
ML Model Prediction
Early Premium or MLP based on regime
Knowledge Graph Score
CWE rates + ATT&CK + vendor history
Similarity Score
Sentence-BERT description similarity
# From: 10_adaptive_risk_model.py
# Learn optimal weights via logistic regression on component scores
from sklearn.linear_model import LogisticRegression
# Stack the three component scores
X = np.column_stack([ml_scores, kg_scores, similarity_scores])
y = exploitation_labels
# Train logistic regression to learn optimal combination
model = LogisticRegression(max_iter=1000)
model.fit(X, y)
# Extract learned weights (normalized)
raw_weights = model.coef_[0]
weights = raw_weights / raw_weights.sum()
# Result: [0.442, 0.312, 0.246] for global weights
# Per-regime weights vary based on data availability
| Regime | ML Weight | KG Weight | Similarity Weight | Dominant |
|---|---|---|---|---|
| SPARSE | 33.3% | 0% | 66.7% | Similarity |
| MODERATE | 41.5% | 16.9% | 41.6% | Balanced |
| RICH | 39.9% | 27.2% | 33.0% | ML Model |
Uses Sentence-BERT (all-MiniLM-L6-v2) to compute semantic similarity between the target CVE's description and descriptions of known exploited CVEs.
desc_similarity_max is the #1 most important feature
because CVE descriptions contain strong signals about exploitability. Similar language to
past exploited CVEs indicates similar attack surface.
# Similarity calculation
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
# Encode CVE description
embedding = model.encode(cve_description) # 384-dim vector
# Compare to exploited CVE index
similarities = cosine_similarity([embedding], exploited_embeddings)[0]
# P_sim = weighted average of top-k similarities
p_sim = 0.5 * max_similarity + 0.3 * mean_top10 + 0.2 * weighted_avg