OpShift - Team Alert Management

Slack Is Not an Alerting System

Slack is where your team communicates. It's also where notifications go to die at 3 AM. No matter how many channels you create or how aggressively you configure notifications, Slack alone cannot reliably wake an engineer for a critical incident.

The issue isn't Slack's fault. It's that a single notification channel can't serve every alerting scenario. A disk usage warning at 2 PM needs a different response than a production database crash at 2 AM.

This is where multi-channel escalation policies come in — and most teams get them wrong.

The Escalation Ladder

A well-designed escalation policy moves through notification channels of increasing urgency, giving the on-call engineer a reasonable window to respond at each step before escalating.

Here's a practical escalation ladder:

Step	Channel	Wait Time	Use Case
1	Slack channel	0 min	Team visibility, low-severity alerts stop here
2	Slack DM	2 min	Direct notification to on-call engineer
3	SMS	5 min	Breaks through Do Not Disturb
4	Phone call	10 min	Guaranteed to wake someone up
5	WhatsApp	12 min	Alternative channel if phone is missed
6	Secondary on-call	15 min	Escalate to backup engineer
7	Engineering manager	20 min	Management escalation

The key insight: not every alert needs to traverse the full ladder. Severity determines where an alert enters the ladder and how far it goes.

Configuring by Severity

Different severity levels should start at different points in the escalation ladder:

Critical (production outage, data loss risk)

Start at: SMS + Phone call simultaneously
Escalation: Secondary on-call after 5 minutes, manager after 10
Quiet hours: Always breaks through

High (degraded service, elevated error rates)

Start at: Slack DM + SMS
Escalation: Phone call after 5 minutes, secondary on-call after 10
Quiet hours: Breaks through with SMS only (no phone call)

Medium (non-critical service issues, performance degradation)

Start at: Slack DM
Escalation: SMS after 10 minutes
Quiet hours: Held until business hours

Low (informational, trending metrics, maintenance reminders)

Start at: Slack channel only
Escalation: None
Quiet hours: Always held until business hours

Integrating Quiet Hours

Quiet hours add a time-based dimension to your escalation policies. The concept is simple: during defined quiet periods, only alerts above a certain severity threshold trigger notifications.

A practical quiet hours configuration:

Quiet window: 10:00 PM to 8:00 AM local time (timezone-aware per engineer)
Break-through threshold: High and Critical severity
Held alerts: Medium and Low severity queued for morning delivery
Morning digest: Batch notification of held alerts at 8:00 AM

The timezone awareness is critical for distributed teams. An engineer in London shouldn't be woken at 3 AM because the quiet hours are configured for Pacific time.

Common Mistakes

Mistake 1: Too Many Steps, Too Short Timers

Some teams configure 8-step escalation policies with 1-minute intervals between each step. This means the on-call engineer gets bombarded across every channel within 8 minutes, before they've even had time to open their laptop.

A better approach: give at least 3-5 minutes between steps. If someone doesn't respond to an SMS within 5 minutes, a phone call is warranted. If they don't respond within 2 minutes, they're probably just unlocking their phone.

Mistake 2: Same Escalation for Every Alert

Using the same escalation path for a disk space warning and a complete service outage guarantees fatigue. Engineers learn that phone calls don't always mean something is on fire, so they start treating phone calls like Slack messages.

The fix: tie escalation aggressiveness to severity. Save phone calls for genuine emergencies.

Mistake 3: No Acknowledgment Loop

Escalation should stop when someone acknowledges the alert. If your system keeps escalating after acknowledgment, it creates unnecessary noise and erodes trust in the system.

Ensure your escalation policy includes:

Acknowledgment stops further escalation
Acknowledgment can happen from any channel (Slack emoji, SMS reply, dashboard button)
If acknowledged but not resolved within a time window, a gentler follow-up reminder fires

Mistake 4: Forgetting the Secondary On-Call

Every escalation policy should include a backup. The primary on-call engineer might be in a dead zone, might have a phone issue, or might be dealing with a separate incident.

Best practice: always have a secondary on-call who gets notified if the primary doesn't acknowledge within the defined window.

Mistake 5: Not Testing the Escalation Path

You should test your escalation policies regularly. A monthly test page that traverses the full escalation path ensures:

Phone numbers are correct and reachable
SMS delivery is working
Slack integrations haven't broken
Secondary on-call contacts are up to date
Quiet hours configuration is correct

Building the Right Policy for Your Team

Start with these questions:

How many severity levels do you need? (Most teams do well with 3-4)
What's the maximum acceptable time to acknowledge a critical incident? (Most teams target 5-10 minutes)
Who is the secondary on-call? (Always have one)
What hours should be considered "quiet"? (Account for timezones)
Which channels does your team actually respond to? (Test this — don't assume)

Then build your policy from the answers, starting simple and adding complexity only when you identify gaps.

A Practical Starting Configuration

For most teams getting started with multi-channel escalation, this configuration works well:

Default policy (Medium/Low): Slack channel → (5 min) → Slack DM → (15 min) → SMS to secondary

Urgent policy (High/Critical): Slack DM + SMS → (5 min) → Phone call → (5 min) → Secondary on-call SMS + Phone → (10 min) → Manager notification

Quiet hours: 10 PM - 8 AM, only High/Critical break through

This gives you four escalation steps for urgent issues and a 20-minute window to get someone engaged, while keeping non-urgent alerts out of people's pockets during off-hours.

Getting Started

OpShift supports multi-channel escalation across Slack, SMS, phone calls, and WhatsApp — with severity-based routing, quiet hours, and acknowledgment from any channel. Escalation policies are configured per team and respect PTO schedules automatically.

Flat pricing at $14/month for up to 50 users. No per-seat charges. Set up your escalation policies at opshift.io.

Setting Up Multi-Channel Escalation Policies That Actually Work