Remediation SLAs That Engineering Teams Actually Meet

The standard vulnerability remediation SLA policy reads something like: Critical — 15 days. High — 30 days. Medium — 90 days. Low — 180 days. Most security policies have a version of this. Most engineering teams are in violation of it most of the time. And in a perverse way, that's almost by design.

We spent a long time running these workflows manually before building tooling around them, and the clearest pattern we saw was that blanket severity-based SLAs fail for a specific, predictable reason: they treat all findings of the same CVSS severity as identical remediation obligations, when in practice the remediation burden varies enormously based on what's affected and how likely it is to be exploited. An engineering team that receives a "Critical — fix in 15 days" ticket for a CVE on an air-gapped build server reacts the same way as they react to one on the payment processing API — with low urgency, because the urgency signal is the same. The SLA has flattened the information.

This post is about how to structure SLAs that carry more information, why that makes them more credible to engineering teams, and how to measure adherence without creating gaming incentives that undermine the whole exercise.

Why Blanket CVSS-Based SLAs Fail

The first failure mode is volume. If your scanner produces 800 Critical findings per quarter (common at mid-size environments with flat asset criticality models), you've just created 800 "fix in 15 days" obligations. Engineering teams look at that queue and make a rational decision: these can't all actually be critical. And they're right — but they lack the data to distinguish the ones that actually are from the ones that aren't. The result is that nothing in the Critical queue gets treated as sprint-blocking, because treating all of it as sprint-blocking is operationally impossible.

The second failure mode is that CVSS severity doesn't correlate well with remediation difficulty. A CVSS 9.8 on a library that has a patch available and ships to production in your standard release pipeline might take 2 days to close. A CVSS 9.8 on a component embedded in a medical device firmware stack managed by a third-party vendor might have no available patch and a 90-day vendor response cycle. A 15-day SLA applied to both is either too aggressive for the second or not aggressive enough for the first.

The third failure mode is that it creates compliance theater. SLAs that engineering teams consistently miss aren't enforced; they become background noise. Security teams learn not to escalate because escalating every missed SLA would consume all available management bandwidth. The SLA exists on paper, and actual remediation happens on whatever schedule engineering fits it into organically.

Building a Two-Axis SLA Model

The SLA structure that works better has two axes: asset criticality tier and exploit likelihood. The combination of these two dimensions produces a prioritization tier, and the prioritization tier maps to the SLA window.

Asset criticality tier is something you define once and maintain in your asset inventory. It's not about CVSS — it's about what the asset does in your business. Tier 1 assets (payment processing, authentication services, data storage for regulated data, internet-facing ingestion endpoints) carry a higher blast radius than Tier 3 assets (internal dev tooling, staging environments, air-gapped test infrastructure). The tier is assigned by a combination of: data sensitivity of what the asset handles, network exposure (internet-facing vs. internal vs. air-gapped), and revenue/compliance dependency.

Exploit likelihood comes from EPSS + CISA KEV status. Is this CVE in the KEV catalog? High. Does it have an EPSS score above some threshold (we use the 70th percentile as a starting point)? High. Does it have a public PoC with a working exploit? High. Everything else defaults to standard.

The resulting SLA matrix looks roughly like this:

High exploit likelihood + Tier 1 asset: 7 days. This is the sprint-blocking tier. It should represent a small fraction of total findings — if it's more than 2-3% of your monthly volume, your thresholds are too loose. These are the findings where known exploitation activity intersects with your highest-value infrastructure.

High exploit likelihood + Tier 2-3 asset: 21 days. Tracked, not sprint-blocking. Engineering team acknowledges the finding within 48 hours and commits to a patch in the next sprint cycle.

Standard exploit likelihood + Tier 1 asset: 30 days. These are the theoretically dangerous, currently quiet findings. The window is longer but the finding stays visible and escalates if EPSS moves.

Standard exploit likelihood + Tier 2-3 asset: 60-90 days depending on severity, normal patch cadence. No sprint interruption.

The key change here is that the SLA is defined by the intersection of two variables, not one. "High exploit likelihood + Tier 1 asset" is a smaller population than "all Critical CVSS." Engineering teams that get a ticket tagged as "Tier 1 / Active Exploitation" understand that this is genuinely different from the steady stream of CVE tickets they see monthly.

The SLA Start Clock Problem

One decision that matters a lot but rarely appears in SLA policy: when does the SLA clock start?

There are three options: when the scanner first detects the finding, when the security team triages and assigns it, or when the remediation ticket is created in the engineering queue. Each has different implications.

Starting from first detection is accurate but creates a fairness problem. If your triage process takes 5-7 days (common without automation), you're burning half the SLA window before engineering even sees the ticket. Teams will game this by arguing the clock should start when they receive it — and they're not entirely wrong.

Starting from ticket creation is the most engineering-friendly but can be gamed by security teams that delay triage to reset the clock on aging findings. It also creates a perverse incentive to not create tickets for difficult-to-remediate findings.

The approach we settled on: the SLA clock starts from first detection, but the triage phase has its own committed SLA (e.g., security team triages and creates a ticket within 48 hours for Tier 1 findings, 72 hours for others). If security misses the triage SLA, the remediation SLA window adjusts. This makes both teams accountable to their portion of the process rather than the whole window defaulting to engineering.

Measuring Adherence Without Enabling Gaming

The three most common gaming patterns in SLA compliance reporting:

False-close / reopen cycling: Findings get closed in the ticketing system before scanner verification confirms the fix. When the next scan runs and re-detects, a new finding opens with a fresh clock. The MTTR looks great; the vulnerability persisted.

Defense: require scanner confirmation before closure, and track the reopen rate. A finding that reopens within 14 days of closure is flagged. If your reopen rate exceeds 5-8%, you have a verification process problem.

Accepted risk queue inflation: Hard-to-remediate findings get moved to "accepted risk" status, which removes them from SLA tracking. CVSS 9.x findings accepted as risk because the vendor hasn't released a patch are fine — that's a legitimate compensating control acknowledgment. But if the accepted-risk queue is growing faster than the remediation queue, something is wrong.

Defense: track accepted-risk queue size and age separately. Report it alongside SLA compliance numbers. Flag accepted findings where the EPSS score has moved above threshold since acceptance, because the risk calculus has changed.

Scope creep on asset criticality: Assets get reclassified to lower tiers to reduce the SLA burden they generate. This happens organically when teams are measured on SLA compliance and they control their own asset criticality ratings.

Defense: asset criticality changes require a documented approval — either a security team review or a business justification. Automated alerts when a Tier 1 asset is reclassified to Tier 2 give you visibility without turning every reclassification into a bureaucratic event.

Communicating SLAs to Engineering Teams

SLAs that engineering teams actually meet are negotiated, not dictated. The policy can come from security, but the specific windows for each tier should reflect actual remediation capacity. If your engineering teams take two weeks to complete a sprint and can realistically commit one sprint-blocking item per sprint from security, a 7-day window for the highest tier is already misaligned before the first ticket is opened.

We've seen better results from running a calibration exercise before locking SLA windows: look at the last 90 days of remediation data, identify the actual distribution of time-to-close by severity and asset tier, and use that as the baseline. If the 75th percentile for your current "Critical" tier is 42 days, setting a 15-day SLA creates a 75% miss rate from day one. Starting at 45 days, driving it down to 30 over the next quarter, is a program that shows progress rather than chronic failure.

We're not saying SLAs should be set based on what's easy to hit — the point is for them to be tight enough to create urgency without being so unachievable that teams stop caring. That calibration is something you have to do with real data from your environment, not from a compliance framework template.

The goal of an SLA program is actual risk reduction, not SLA compliance as a metric in its own right. A program where 95% of findings close within SLA because SLAs are set loosely is worse than a program where 70% close within SLA but the SLA windows are tight and the findings in the top tier are genuinely being sprint-blocked. The adherence number only means something if the SLA structure beneath it reflects actual risk prioritization.

Why Blanket CVSS-Based SLAs Fail

Building a Two-Axis SLA Model

The SLA Start Clock Problem

Measuring Adherence Without Enabling Gaming

Communicating SLAs to Engineering Teams

See these principles in action