Every vulnerability management policy I've seen includes an SLA table. Critical findings: remediate within 15 or 30 days. High: 60 days. Medium: 90 days. The table is easy to write and hard to enforce — not because engineering teams are uncooperative, but because the gap between "SLA defined" and "SLA operationalized" is much wider than it looks on paper.
The definition problem is mostly solved. The operationalization problem — who owns enforcement, what escalation looks like, how compliance is measured, and what happens to findings that legitimately can't be remediated on the standard schedule — is where most programs quietly abandon the SLA and shift to best-effort. This post is about making SLA enforcement work in a small security team managing real infrastructure at realistic headcount.
The Four Places SLA Enforcement Breaks Down
When we talk to security teams about why their SLAs aren't working, the failure points cluster into four categories. Understanding which one is active in your program determines the fix.
1. Owner assignment gaps. A remediation ticket that doesn't have a specific named owner will not be closed by the SLA deadline. "Platform team" is not an owner. "Backend engineers" is not an owner. The SLA clock runs from ticket creation; if the ticket spends 12 days sitting in a team queue without assignment to a specific person, you've already consumed 40% of a 30-day window before any work has started.
2. SLA start time disputes. When a finding is escalated past its deadline, the first thing that happens is debate about when the clock started. Did the SLA start when the scanner first found it? When the ticket was created? When the engineer acknowledged it? Without a single, unambiguous definition of SLA start time documented and consistently applied, every escalation becomes a negotiation rather than a policy enforcement action.
3. Escalation path opacity. Engineers don't respond well to generic deadline warnings. "Your ticket is due in 5 days" is an email many security programs send; it has poor conversion to action. What works better: "api-gateway-prod has CVE-2025-3117 due March 27. Exploit kit confirmed active. This is above the standard threshold — I need your patch plan by EOD Friday or I'm looping in your manager." Specific, named asset, specific deadline, named consequence. Generic SLA reminders are noise; specific escalation emails are signal.
4. No exception process. Some findings genuinely can't be remediated on the standard schedule. The patch doesn't exist yet. The fix requires a maintenance window that's 45 days out. The CVE affects a third-party appliance with no vendor update. If your SLA policy has no documented exception process, teams will route around it — either by marking findings as accepted risk (inflating your accepted-risk queue) or by simply ignoring the deadline. An explicit exception process with a documented compensating control requirement actually tightens enforcement on everything that doesn't qualify for exception.
SLA Tiers That Reflect Actual Risk
The standard severity-based SLA table (Critical = 30 days, High = 60 days) is a starting point, not a final answer. Flat SLAs by severity ignore the business risk context that makes some high-severity findings genuinely urgent and others genuinely deferrable.
We recommend a two-dimension SLA matrix: severity on one axis, exploit status on the other.
| Severity | No Known Exploit | Public PoC Available | Active Exploitation (CISA KEV) |
|---|---|---|---|
| Critical (9.0+) | 30 days | 14 days | 72 hours |
| High (7.0–8.9) | 60 days | 30 days | 7 days |
| Medium (4.0–6.9) | 90 days | 60 days | 30 days |
The 72-hour SLA for actively-exploited critical findings is aggressive. It should be — CISA KEV listing means threat actors are actively using this vulnerability in campaigns. The compensating control option exists here: if a patch isn't available or the patch window is more than 72 hours away, isolation or network-level mitigation counts as remediation provided it's documented.
Making Escalation Automatic
Manual SLA tracking at scale is untenable for a 2-3 person security team. If enforcement depends on someone remembering to check a spreadsheet weekly and send emails, it will degrade whenever that person has other priorities — which is always.
Automated escalation means: the system watches SLA deadlines and triggers escalation actions without a human initiating them. Concretely:
At 75% of SLA elapsed (e.g., day 22 of a 30-day window): the assigned engineer gets a direct Slack message with the specific finding, asset, and deadline. Not an email digest — a direct message that expects a response. The message should ask for one of three responses: "on track for date X," "needs extension — reason Y," or "blocked by Z."
At 90% of SLA elapsed (day 27): the engineer's manager is added to the escalation thread. This is not punitive — it's operational. Engineering managers need visibility into security work that's about to breach a committed deadline.
At SLA breach: the security program lead and the engineering manager both receive a breach notification. The finding is automatically re-classified as SLA-breached in the tracking system, which shows up in the weekly CISO report. The breach is visible and documented.
The escalation path only works if it's known in advance. Before the SLA policy goes live, security and engineering leadership need to agree: when security escalates to your manager, that's not an accusation — it's the documented process for findings that haven't been resolved. Organizations that have agreed on this in advance handle escalations as routine; organizations that haven't often experience them as political incidents.
The Exception Process That Tightens, Not Loosens
A formal exception process sounds like a loophole factory. In practice, a well-designed exception process reduces SLA breaches because it forces teams to engage with the finding rather than passively letting the deadline pass.
An exception request must include: the specific finding ID, the reason the standard SLA can't be met (specific, not "we're busy"), the compensating control currently in place, and the new committed remediation date. The exception request is submitted before the SLA deadline — requests after breach are not eligible for retroactive exceptions.
Security reviews exception requests against defined criteria: is the compensating control adequate for the risk level? Is the new committed date reasonable? Does the finding qualify for exception at all given its exploit status? Blanket exceptions for entire asset classes ("all legacy systems exempt") are not approved.
The effect: teams know that the only way to get more time is to document why, show a compensating control, and commit to a new date. This is strictly more work than just closing the ticket. The ones who go through the process are the ones with legitimate operational constraints; the rest close the ticket rather than write the exception request.
Measuring SLA Compliance in the Security Metrics Dashboard
SLA compliance should be a visible metric in the weekly security posture report: what percentage of findings due this week were closed on time? Tracked by severity tier and by owning team.
We're not saying 100% SLA compliance is the right target — a program that maintains 100% by systematically over-classifying findings as exceptions isn't actually more secure. The right target is high compliance on critical and actively-exploited findings (target: 90%+ on 72-hour and 14-day SLAs), with lower compliance acceptable on medium-severity findings where operational constraints are real.
The team-level breakdown reveals systemic issues that aggregate metrics hide. If one engineering team consistently has below-average SLA compliance, the root cause is almost never that team being uncooperative — it's usually that their asset surface has elevated finding volume, their on-call rotation prevents planned maintenance windows, or the findings assigned to them require upstream dependency upgrades that aren't in their control. SLA compliance metrics used for accountability should prompt investigation, not automatic blame assignment.