All posts

Cron Monitoring: How to Track Scheduled Jobs

A guide to monitoring cron jobs and scheduled tasks, covering check-in patterns, failure detection, and integration with error tracking.

The Problem with Cron Jobs

Cron jobs are the silent workhorses of most applications. They process payments, send emails, clean up data, generate reports, and sync external systems. When they work, nobody notices. When they fail, the consequences can be severe: missed invoices, stale data, broken integrations.

The core problem is that cron job failures are silent by default. If your nightly data export cron stops running, nothing alerts you. The job simply does not execute, and you only find out when someone asks where the report is.

What Is Cron Monitoring?

Cron monitoring (also called check-in monitoring or heartbeat monitoring) works by expecting your cron jobs to "check in" at regular intervals. If a check-in is missed, the monitoring system alerts you.

The pattern is simple:

  1. You define a monitor with an expected schedule (e.g., every hour)
  2. Your cron job sends a check-in when it starts and when it completes
  3. If the monitoring system does not receive a check-in on time, it alerts you

Setting Up Cron Monitoring

Step 1: Define Your Monitors

In your monitoring tool (like Bugsly), create a monitor for each cron job:

  • Name: "Nightly Data Export"
  • Schedule: Every day at 2:00 AM UTC
  • Grace period: 15 minutes (allows for slight timing variations)
  • Alert after: 1 missed check-in

Step 2: Add Check-Ins to Your Jobs

import bugsly
import requests

def nightly_export():
    # Check in: job started
    monitor_id = "nightly-data-export"
    bugsly.monitor.check_in(monitor_id, status="in_progress")

    try:
        # Do the actual work
        data = fetch_data()
        export_to_s3(data)

        # Check in: job completed successfully
        bugsly.monitor.check_in(monitor_id, status="ok")
    except Exception as e:
        # Check in: job failed
        bugsly.monitor.check_in(monitor_id, status="error")
        bugsly.capture_exception(e)
        raise

For simpler setups, you can use a URL-based check-in (ping monitoring):

# In your crontab
0 2 * * * /usr/bin/python export.py && curl -s https://monitor.bugsly.dev/ping/abc123

Step 3: Configure Alerts

Set up notifications for:

  • Missed check-ins: The job did not run at all
  • Failed check-ins: The job ran but reported an error
  • Duration alerts: The job took longer than expected

What to Monitor

Not every cron job needs monitoring. Focus on jobs where failure has consequences:

Critical (Monitor Always)

  • Payment processing jobs
  • Data backup jobs
  • User notification jobs (emails, SMS)
  • Integration sync jobs (third-party APIs)
  • Security jobs (certificate renewal, key rotation)

Important (Monitor Recommended)

  • Report generation
  • Cache warming
  • Data cleanup and archival
  • Search index updates

Low Priority (Optional)

  • Log rotation
  • Temporary file cleanup
  • Analytics aggregation

Common Failure Patterns

1. The Silent Stop

The most dangerous pattern: the cron daemon stops running jobs without any error. This happens after server restarts, crontab misconfigurations, or container redeployments. Check-in monitoring catches this immediately.

2. The Slow Degradation

A job that normally takes 5 minutes starts taking 30 minutes, then 2 hours. Duration monitoring detects this trend before the job starts timing out.

3. The Partial Failure

The job runs but processes only a subset of records due to a query change or data issue. Add success metrics to your check-ins:

bugsly.monitor.check_in(monitor_id, status="ok", context={
    "records_processed": processed_count,
    "records_failed": failed_count,
})

4. The Overlapping Run

A job that is not finished before the next scheduled run starts. This causes duplicate processing or resource contention. Use the "in_progress" status to detect overlaps.

Integration with Error Tracking

Cron monitoring and error tracking work best together. When a cron job fails:

  1. The check-in monitor alerts you that the job failed
  2. The error tracker shows the exception with full stack trace
  3. AI analysis explains what went wrong
  4. You fix the issue and the next run succeeds

Tools like Bugsly provide both capabilities in a single platform, so you do not need to set up separate monitoring systems.

Try Bugsly Free

AI-powered error tracking that explains your bugs. Set up in 2 minutes, free forever for small projects.

Get Started Free