CVSS 3.1 Scores Were Never Designed to Set Remediation Priority

Every vulnerability management program in existence uses CVSS scores. Most of them use CVSS scores incorrectly — not because their teams are careless, but because the original design intent of CVSS has been obscured by two decades of tooling that treats Base Score as a remediation priority signal.

It isn't one. Understanding the gap between what CVSS actually measures and what security teams need to make remediation decisions is the prerequisite to building a prioritization process that reflects real risk rather than abstract severity.

What CVSS 3.1 actually measures

The CVSS 3.1 Base Score is a function of six metrics: Attack Vector, Attack Complexity, Privileges Required, User Interaction, Scope, and the CIA impact triad (Confidentiality, Integrity, Availability). These metrics describe a vulnerability's properties in an idealized attacker scenario — specifically, the worst-case exploitation under conditions most favorable to the attacker.

Attack Vector = Network means the vulnerability is exploitable remotely over a network connection. It does not mean your network is reachable. Attack Complexity = Low means exploitation doesn't require special conditions. It doesn't mean conditions in your environment are met. A CVSS 9.8 scores that way because, if an attacker had access to a vulnerable system, exploitation would be trivially easy and the impact would be catastrophic. None of that speaks to whether an attacker can reach your specific instance of that system.

The CVSS specification is explicit about this. CVSS Base Scores are designed to be consistent across all organizations — the same CVE gets the same Base Score regardless of who's running the software. That universality is the feature. It's also why Base Score alone can't drive remediation priority: it's intentionally context-free.

Temporal Score and Environmental Score: the missing pieces

CVSS 3.1 has two additional score types beyond Base Score that the spec explicitly designed for organizational context: Temporal Score and Environmental Score.

Temporal Score modifies Base Score based on current exploit availability (is there a public exploit kit? has a vendor patch been released?), exploit code maturity, and remediation level. A CVSS 9.8 with no public exploit and an official patch available would score lower on Temporal than the same CVE with a weaponized exploit-kit in active ransomware campaigns.

Environmental Score goes further — it lets each organization apply their own Modified Base Metrics to account for how they've actually deployed the software. A security control that prevents network access to a vulnerable service can be encoded in the Environmental Score to reflect reduced actual risk.

The reason almost no one uses Temporal or Environmental Scores in practice: they require data. Exploit availability data requires a live threat intelligence feed that maps CVEs to real-world exploitation evidence. Environmental Score requires structured knowledge of your asset topology, network segmentation, and deployed compensating controls. Most organizations have none of this in a queryable form. So they fall back to Base Score as the de facto sorting key — which is using a universal severity rating as a context-specific prioritization signal.

The EPSS model: a different approach to exploit prediction

FIRST (Forum of Incident Response and Security Teams), which maintains CVSS, also developed a separate scoring model called EPSS — Exploit Prediction Scoring System. EPSS takes a fundamentally different approach: instead of measuring vulnerability properties, it estimates the probability that a given CVE will be exploited in the wild within 30 days, based on historical exploitation data and current threat intelligence signals.

EPSS scores range from 0.0 to 1.0. A CVE scoring 0.94 on EPSS has historically shown exploit characteristics in 94% of similar vulnerability profiles — it's being actively exploited, or exploitation is imminent. A CVE scoring 0.003 may have a CVSS 9.8 Base Score but virtually no observed exploitation history.

The implication is significant. EPSS research has consistently found that the majority of CVEs in NVD have EPSS scores below 0.05 — meaning most published vulnerabilities see no meaningful exploitation activity in the real world. A small fraction of CVEs account for the large majority of actual exploitation incidents. CVSS Base Score is a poor predictor of which fraction a given CVE falls into.

We're not saying EPSS replaces CVSS — both carry useful information that the other doesn't capture. CVSS tells you how bad exploitation would be. EPSS tells you how likely exploitation is. A comprehensive prioritization model needs both.

The CISA KEV catalog as a hard signal

The CISA Known Exploited Vulnerabilities catalog is the closest thing the industry has to a ground truth list of CVEs that have been actively exploited in real attacks against real organizations. CISA maintains it under the terms of Binding Operational Directive 22-01 — federal agencies are legally required to remediate KEV-listed vulnerabilities within defined windows. For everyone else, KEV membership is the highest-confidence exploit-in-wild signal available publicly.

A CVE appearing in CISA KEV is a qualitative data point, not a quantitative score — but it's a hard signal. It means exploitation isn't theoretical. It means some organization has already been hurt by this exact vulnerability. Regardless of CVSS Base Score, a KEV-listed CVE in your environment should jump to the front of your remediation queue.

The catch: KEV covers only a fraction of exploited CVEs. CISA focuses on CVEs with confirmed exploitations that meet their relevance criteria. Threat actors exploit many CVEs that never make it into KEV because the targets are private-sector only, or because CISA's publication cadence lags real-world exploitation by days to weeks. KEV is necessary but not sufficient.

Where business context fits in

Even combining CVSS Base Score, EPSS, and CISA KEV membership, you still haven't incorporated the variable that most directly affects your actual risk: what does this asset do for your business?

Consider a plausible scenario. An environment with around 1,200 managed assets runs a weekly Qualys scan. Two findings from the same scan: CVE-A scores CVSS 8.8, EPSS 0.12, not in KEV, affecting an internal HR data server. CVE-B scores CVSS 7.4, EPSS 0.31, in CISA KEV, affecting a web application server that processes customer authentication tokens.

Sorted by CVSS Base Score alone, CVE-A ranks higher. Sorted by EPSS or KEV, CVE-B is clearly more urgent. But neither ranking accounts for the fact that CVE-B's asset handles live customer session data — its blast radius on successful exploitation extends to every authenticated user in the system, while CVE-A's impact is contained to HR data behind a VPN. The full prioritization picture requires CVSS, EPSS, KEV status, and business context together.

The remediation SLA mismatch

Many vulnerability programs set remediation SLAs by CVSS tier: Critical findings must be remediated within 15 days; High within 30; Medium within 90. The tiers are reasonable categories, but the underlying signal is wrong if you're using CVSS Base Score alone.

This creates two failure modes. First, the high-CVSS-but-low-actual-risk finding gets emergency engineering resources pulled onto it because the SLA clock is running — work that could have been scheduled sensibly in the next sprint. Second, the lower-CVSS-but-high-actual-risk finding (KEV-listed, affecting a critical asset, actively exploited in campaigns) gets a 30-day window that's far too relaxed given its real exposure.

Business-context-adjusted risk scoring would flip the SLA assignment on both of these. The SLA should track to the actual exposure level, not the standardized severity label.

What a better model looks like

The model we converged on for Vendrsec combines four inputs per finding: CVSS Base Score (as one weighted factor, not a sorting key), current EPSS score, CISA KEV and threat intel feed membership, and asset criticality score from the organization's own asset inventory. Each input is weighted, and the combined output is a Vendrsec Risk Score — a number that tells you whether this finding requires immediate action, scheduled remediation, or queued backlog treatment.

The weighting isn't a fixed formula — it adjusts based on what Vendrsec knows about your environment. A KEV-listed CVE on a Tier-1 critical asset in an environment where that asset type is actively targeted in current campaigns scores dramatically higher than a KEV-listed CVE on an isolated internal tool with no inbound path from production.

CVSS doesn't disappear from the output. It stays visible as an input factor — engineers want to see it, and it carries real information about exploitation difficulty and impact magnitude. What changes is that it stops being the sorting key. It becomes one voice in a conversation instead of the only voice in the room.

Understanding this distinction — what CVSS was designed to measure versus what organizations actually need — is the foundational step. Every prioritization improvement builds from this starting point.