Weights Explainer - Hybrid ML Models for CVE Exploitation Prediction

How Weights Were Learned

The Problem

We have three prediction components:

ML Neural network prediction
KG Knowledge graph risk score
Sim Description similarity score

How do we combine them optimally? We learn the weights from data!

The Solution: Logistic Regression

# Stack component scores

X = [ml_scores, kg_scores, sim_scores]

y = exploitation_labels

# Fit logistic regression

model = LogisticRegression().fit(X, y)

weights = softmax(model.coef_)

The learned coefficients tell us how much each component contributes to the final prediction.

Learned Weights by Regime

SPARSE 71.6% of CVEs

ML Score 33.3%

KG Score 0%

Similarity 66.7%

Similarity leads! When data is limited, matching descriptions to known exploits is most reliable.

MODERATE 25.2% of CVEs

ML Score 41.5%

KG Score 16.9%

Similarity 41.6%

Balanced! All three components contribute roughly equally when partial data is available.

RICH 3.2% of CVEs

ML Score 39.9%

KG Score 27.2%

Similarity 33.0%

ML leads! With full EPSS and sightings, the neural network is highly accurate.

Interactive Weight Adjuster

Adjust the weights manually to see how it affects the combined prediction. Compare with learned weights to understand their optimization.

ML Score Weight

KG Score Weight

Similarity Weight

Sample Prediction

ML Score: 0.72

KG Score: 0.45

Similarity Score: 0.68

Combined Prediction:

Formula: x 0.72 + x 0.45 + x 0.68

Where Weights Are Used

1. Prediction Routing

Based on the CVE's data regime, the appropriate weights are selected automatically. SPARSE CVEs use similarity-heavy weights.

2. Score Combination

The final prediction is: w_ml x ML + w_kg x KG + w_sim x Sim. Higher weight = higher contribution to final score.

3. Confidence Weighting

Components with higher learned weights indicate they're more reliable for that regime. This informs which signals to trust.

Ensemble Weights Explainer