How to Catch Broken Releases Before Users Report…

The Friday Deploy Disaster

You deploy at 4 PM on Friday. Everything looks fine. You go home. At 9 PM, your inbox has 47 user complaints about a broken checkout flow. The error has been happening for 5 hours, affecting every transaction. You missed it because your error tracking dashboard was showing the same total error count — the new errors were lost in the noise.

This scenario is preventable. Here's how.

Why Bad Deploys Slip Through

Most teams detect broken releases through one of these channels:

User reports — slowest and most embarrassing
Manual checking — "let me click around after deploying" (unreliable)
CI/CD tests — catches code bugs, not runtime issues
Error monitoring — should catch it, but often doesn't

Error monitoring fails to catch bad deploys when:

The new errors blend into existing error noise
Alert rules trigger on *new error types* but not on *error rate spikes*
There's no release tagging, so you can't filter by deploy version
The dashboard shows aggregate numbers, not per-release comparisons

Release Tagging: The Foundation

Release tagging attaches a version identifier to every error event. This lets you answer: "did this error exist before deploy v2.3.1?"

Bugsly.init({
  dsn: "YOUR_DSN",
  release: "my-app@2.3.1", // Tag with your version
  environment: "production",
});

With release tags, you can:

Filter errors by release version
See which release introduced a new error
Compare error rates between releases
Identify regressions automatically

The 3-Layer Detection System

Layer 1: Error Rate Spike Alert

Set up an alert that fires when the error rate increases significantly:

Condition: Error rate > 200% of baseline
Window: 15 minutes
Action: Slack alert to #deploys channel

This catches the most common failure mode: a deploy introduces a bug that affects many requests. A 200% threshold filters out normal fluctuation while catching real problems.

Layer 2: New Error Type Alert

Condition: New error type appears
Threshold: > 10 events in 30 minutes
Action: Slack alert

This catches new bugs from the deploy — errors that literally didn't exist before. The threshold of 10 events prevents alerting on one-off edge cases.

Layer 3: Post-Deploy Health Check (Manual, 5 Minutes)

After every deploy, spend 5 minutes checking your error dashboard:

Check the health indicator — is it still green?
Filter by latest release — any new errors from this version?
Check the event trend — any spike in the last 10 minutes?
Run a smoke test — hit 3-5 critical endpoints manually

This takes 5 minutes and catches issues that automated alerts might miss (like a subtle performance degradation).

The Post-Deploy Checklist

Here's a practical checklist to use after every production deploy:

[ ] Deploy completed successfully (CI green)
[ ] Error dashboard shows no new critical errors (2-minute check)
[ ] Error rate is within normal range (no spike in last 10 minutes)
[ ] Critical user flows work (login, checkout, main feature — 3-minute smoke test)
[ ] Alert channel is quiet (no new alerts in 5 minutes post-deploy)

If any check fails: roll back immediately, investigate later.

The instinct is to diagnose the problem and push a fix. Resist it. Rolling back takes 2 minutes. Diagnosing and fixing might take 2 hours. Your users shouldn't wait.

The Culture Shift

The biggest change isn't technical — it's cultural. Deploy-and-forget has to become deploy-and-verify. This means:

The person who deploys is responsible for the 5-minute health check — no exceptions
Deploy windows matter — deploying at 4 PM Friday is inherently riskier than 10 AM Tuesday
Rollback is not failure — catching a broken release quickly is a success, not a mistake

Tools That Help

Your error tracking tool should:

Support release tagging (most do)
Show error rate trends (not just totals)
Alert on rate spikes (not just new error types)
Have a health indicator you can check in 5 seconds

Bugsly's dashboard shows a green/yellow/red health badge and surfaces the most frequent unresolved errors. Combined with release tagging and spike alerts, you can detect most broken deploys within 5 minutes.

The Math

Average time to detect a broken release:

Via user complaints: 2-8 hours
Via periodic dashboard checking: 30-60 minutes
Via spike alerts + post-deploy check: 5-15 minutes

The difference between 5 minutes and 5 hours is the difference between affecting 100 users and affecting 10,000 users. The setup takes 15 minutes. The ROI is immediate.

Try Bugsly Free

AI-powered error tracking that explains your bugs. Set up in 2 minutes, free forever for small projects.

Get Started Free

How to Catch Broken Releases Before Users Report Them