SimpleBackupsSimpleBackups

Automating DigitalOcean backup verification

Posted on

You opened the restore dialog and the backup was there. The status said "succeeded." You clicked restore. The server came back missing two weeks of data.

This is one of the most common support situations we see. The backup ran. It completed. It just didn't contain what you expected, or it was silently corrupted somewhere between the dump and the storage destination.

DigitalOcean's native backup tools give you a green checkmark when the process finishes. They don't give you any signal about whether the backup is actually restorable. That's a gap you have to close yourself, and this article shows you how.

By the end you'll have working scripts for DigitalOcean backup verification: Droplet snapshot integrity, database dump health, and Spaces mirror completeness, plus a cron-based pipeline to run all of it automatically and alert you when something is wrong.

Why verification matters more than scheduling

Scheduling is the easy part. You configure a backup to run at 03:00 UTC, it runs, you move on. Most teams stop there.

Verification is the part that actually tells you whether the backup is useful.

A backup can "succeed" and still fail you in any of these ways:

  • The dump file was written, but silently truncated mid-stream because a piped command swallowed a non-zero exit code.
  • The file transferred to storage, but the object is 0 bytes because the upload timed out after the transfer started.
  • The Droplet snapshot completed, but the snapshot ID was deleted by a cleanup script that over-reached.
  • The database dump completed, but pg_restore fails on it because the dump was taken while a schema migration was halfway through.

None of these show up as failures in the backup log. They show up when you try to restore, which is exactly when you can least afford to discover the problem.

The pattern we see

We back up DigitalOcean every day. The failure mode that surprises teams most isn't "the backup didn't run." It's "the backup ran fine but the file it produced was unusable." Verification is how you close the gap between those two things.

There's a second reason verification matters: the same-host problem. DigitalOcean native backups, Droplet snapshots, volume snapshots, and managed database backups all live inside your DigitalOcean account. Verification tells you the backup exists and is intact. It doesn't tell you whether you can access it during an account-level incident. Test the off-site copy, not just the native one. For the full picture on why that distinction matters for compliance, see DigitalOcean off-site backup and compliance.

Verifying Droplet snapshot integrity

DigitalOcean Droplet snapshots are opaque images. You can't mount them or run a checksum against the underlying data without restoring them. What you can verify, programmatically, is:

  1. The snapshot exists by its expected ID or name.
  2. Its status is available (not pending or deleted).
  3. Its size is non-zero and within a reasonable range of your previous snapshot.
  4. It was created within your expected time window.

The doctl CLI gives you everything you need to script this. Install it, authenticate with your personal access token, and you can query snapshot state directly.

#!/usr/bin/env bash
# verify-droplet-snapshot.sh
# Checks that a Droplet's latest snapshot exists, is available,
# and falls within an expected size range.
# Usage: DROPLET_ID=123456789 bash verify-droplet-snapshot.sh

set -euo pipefail

DROPLET_ID="${DROPLET_ID:?DROPLET_ID is required}"
MAX_AGE_HOURS="${MAX_AGE_HOURS:-26}"     # alert if snapshot is older than this
MIN_SIZE_GB="${MIN_SIZE_GB:-1}"          # alert if snapshot is smaller than this

# Fetch the most recent snapshot for this Droplet
SNAPSHOT=$(doctl compute snapshot list
  --resource-type droplet
  --format ID,Name,CreatedAt,SizeGigabytes,Status
  --no-header
  | awk -v id="$DROPLET_ID" '$0 ~ id {print; exit}')

if [[ -z "$SNAPSHOT" ]]; then
  echo "ERROR: No snapshot found for Droplet $DROPLET_ID" >&2
  exit 1
fi

STATUS=$(echo "$SNAPSHOT" | awk '{print $5}')
SIZE=$(echo "$SNAPSHOT" | awk '{print $4}')
CREATED=$(echo "$SNAPSHOT" | awk '{print $3}')

if [[ "$STATUS" != "available" ]]; then
  echo "ERROR: Snapshot status is '$STATUS', expected 'available'" >&2
  exit 1
fi

if (( $(echo "$SIZE < $MIN_SIZE_GB" | bc -l) )); then
  echo "ERROR: Snapshot size ${SIZE}GB is below minimum ${MIN_SIZE_GB}GB" >&2
  exit 1
fi

CREATED_EPOCH=$(date -d "$CREATED" +%s 2>/dev/null || date -j -f "%Y-%m-%dT%H:%M:%SZ" "$CREATED" +%s)
NOW_EPOCH=$(date +%s)
AGE_HOURS=$(( (NOW_EPOCH - CREATED_EPOCH) / 3600 ))

if (( AGE_HOURS > MAX_AGE_HOURS )); then
  echo "ERROR: Snapshot is ${AGE_HOURS}h old, max is ${MAX_AGE_HOURS}h" >&2
  exit 1
fi

echo "OK: Snapshot available, ${SIZE}GB, ${AGE_HOURS}h old"

Run this as a cron job a few hours after your snapshot window ends. If it exits non-zero, treat it the same way you'd treat a failed backup: investigate before the next scheduled run.

doctl must be authenticated with a token that has read access to Droplets and Snapshots. Use a read-only personal access token for verification scripts: it limits blast radius if the script's environment is compromised.

For a deeper look at how DigitalOcean's native snapshot mechanism works and what it actually captures, see how DigitalOcean native backup works.

Verifying database dump integrity

Database dumps are different from snapshots: you can inspect them directly. A pg_dump output in custom format (-Fc) has a table of contents you can read without restoring the whole dump. A MySQL dump is plain SQL you can parse. The key insight is that you don't need to do a full restore to know whether a dump is good.

Postgres dump verification

Three checks give you high confidence in a Postgres dump:

  1. Exit code: did the dump command itself exit 0?
  2. File size: is the output file larger than a credible minimum?
  3. Dry-run restore listing: does pg_restore --list parse the table of contents without errors?

The third check is the most important. pg_restore --list reads the dump's internal structure and prints it without writing to any database. If the dump is truncated, corrupt, or written in the wrong format, this fails. If it succeeds, you know the dump is structurally intact.

#!/usr/bin/env bash
# verify-pg-dump.sh
# Verifies a pg_dump file produced in custom format (-Fc).
# Usage: DUMP_FILE=/path/to/dump.dump bash verify-pg-dump.sh

set -euo pipefail

DUMP_FILE="${DUMP_FILE:?DUMP_FILE is required}"
MIN_SIZE_BYTES="${MIN_SIZE_BYTES:-10240}"   # 10 KB minimum; tune for your DB

# 1. Check the file exists
if [[ ! -f "$DUMP_FILE" ]]; then
  echo "ERROR: Dump file not found: $DUMP_FILE" >&2
  exit 1
fi

# 2. Check file size
ACTUAL_SIZE=$(stat -c%s "$DUMP_FILE" 2>/dev/null || stat -f%z "$DUMP_FILE")
if (( ACTUAL_SIZE < MIN_SIZE_BYTES )); then
  echo "ERROR: Dump file is ${ACTUAL_SIZE} bytes, minimum is ${MIN_SIZE_BYTES}" >&2
  exit 1
fi

# 3. Dry-run restore listing (no DB connection required)
if ! pg_restore --list "$DUMP_FILE" > /dev/null 2>&1; then
  echo "ERROR: pg_restore --list failed — dump may be corrupt or truncated" >&2
  exit 1
fi

echo "OK: Dump valid, ${ACTUAL_SIZE} bytes, pg_restore --list passed"

MySQL dump verification

For MySQL dumps (plain SQL format), replace the pg_restore --list check with a header inspection:

#!/usr/bin/env bash
# verify-mysql-dump.sh
# Checks that a mysqldump file has a valid header and non-trivial size.
# Usage: DUMP_FILE=/path/to/dump.sql bash verify-mysql-dump.sh

set -euo pipefail

DUMP_FILE="${DUMP_FILE:?DUMP_FILE is required}"
MIN_SIZE_BYTES="${MIN_SIZE_BYTES:-10240}"

if [[ ! -f "$DUMP_FILE" ]]; then
  echo "ERROR: Dump file not found: $DUMP_FILE" >&2
  exit 1
fi

ACTUAL_SIZE=$(stat -c%s "$DUMP_FILE" 2>/dev/null || stat -f%z "$DUMP_FILE")
if (( ACTUAL_SIZE < MIN_SIZE_BYTES )); then
  echo "ERROR: Dump is ${ACTUAL_SIZE} bytes, minimum is ${MIN_SIZE_BYTES}" >&2
  exit 1
fi

# mysqldump files always start with "-- MySQL dump"
if ! head -3 "$DUMP_FILE" | grep -q "MySQL dump"; then
  echo "ERROR: File does not look like a valid mysqldump output" >&2
  exit 1
fi

echo "OK: MySQL dump valid, ${ACTUAL_SIZE} bytes"

Run the verification script from the same machine (or a separate verify host) that downloads the dump from storage. Never verify a file in-place on the same server that produced it; a filesystem error affecting the write could affect the read too.

If you're backing up your managed databases off-site, the process for getting a dump file to verify against is covered in backing up DigitalOcean managed databases.

Verifying Spaces mirror completeness

Spaces mirrors are trickier. There's no "table of contents" format to parse. What you can do is compare object counts and checksums between the source and the mirror, and alert when they diverge.

The S3-compatible Spaces API (compatible with the AWS CLI and s3cmd) gives you everything you need. The script below uses the AWS CLI configured for Spaces (--endpoint-url).

#!/usr/bin/env bash
# verify-spaces-mirror.sh
# Compares object count and spot-checks ETags between a source Spaces bucket
# and a mirror bucket. Exits non-zero if counts diverge by more than a threshold.
# Usage:
#   SOURCE_BUCKET=my-source
#   MIRROR_BUCKET=my-mirror
#   SPACES_ENDPOINT=https://nyc3.digitaloceanspaces.com
#   bash verify-spaces-mirror.sh

set -euo pipefail

SOURCE_BUCKET="${SOURCE_BUCKET:?SOURCE_BUCKET is required}"
MIRROR_BUCKET="${MIRROR_BUCKET:?MIRROR_BUCKET is required}"
SPACES_ENDPOINT="${SPACES_ENDPOINT:?SPACES_ENDPOINT is required}"
MAX_DRIFT_PCT="${MAX_DRIFT_PCT:-5}"   # alert if mirror has >5% fewer objects

awss3() {
  aws s3api "$@" --endpoint-url "$SPACES_ENDPOINT"
}

echo "Counting objects in source: $SOURCE_BUCKET"
SOURCE_COUNT=$(awss3 list-objects-v2
  --bucket "$SOURCE_BUCKET"
  --query 'length(Contents[])'
  --output text)

echo "Counting objects in mirror: $MIRROR_BUCKET"
MIRROR_COUNT=$(awss3 list-objects-v2
  --bucket "$MIRROR_BUCKET"
  --query 'length(Contents[])'
  --output text)

echo "Source: $SOURCE_COUNT objects | Mirror: $MIRROR_COUNT objects"

if (( SOURCE_COUNT == 0 )); then
  echo "ERROR: Source bucket is empty — possible misconfiguration" >&2
  exit 1
fi

# Calculate drift percentage
DRIFT_PCT=$(echo "scale=2; (($SOURCE_COUNT - $MIRROR_COUNT) * 100) / $SOURCE_COUNT" | bc)

if (( $(echo "$DRIFT_PCT > $MAX_DRIFT_PCT" | bc -l) )); then
  echo "ERROR: Mirror has ${DRIFT_PCT}% fewer objects than source (threshold: ${MAX_DRIFT_PCT}%)" >&2
  exit 1
fi

# Spot-check: compare ETag of a random object in both buckets
SAMPLE_KEY=$(awss3 list-objects-v2
  --bucket "$SOURCE_BUCKET"
  --max-items 1
  --query 'Contents[0].Key'
  --output text)

SOURCE_ETAG=$(awss3 head-object
  --bucket "$SOURCE_BUCKET"
  --key "$SAMPLE_KEY"
  --query 'ETag'
  --output text)

MIRROR_ETAG=$(awss3 head-object
  --bucket "$MIRROR_BUCKET"
  --key "$SAMPLE_KEY"
  --query 'ETag'
  --output text)

if [[ "$SOURCE_ETAG" != "$MIRROR_ETAG" ]]; then
  echo "ERROR: ETag mismatch on key '$SAMPLE_KEY'" >&2
  echo "  Source:  $SOURCE_ETAG" >&2
  echo "  Mirror:  $MIRROR_ETAG" >&2
  exit 1
fi

echo "OK: Mirror within threshold (${DRIFT_PCT}% drift), ETag spot-check passed"

Reading the results

The ETag check is a fast proxy for checksum comparison. For multipart-uploaded objects, ETags are composite hashes rather than raw MD5s, so they're meaningful for detecting partial writes or mid-air corruption even if they don't give you a true full-file checksum.

For large buckets, listing all objects for an exact count can be slow. Tune MAX_DRIFT_PCT to a value that's tight enough to catch real problems but loose enough to tolerate normal sync lag. For most teams, 2% to 5% is a reasonable starting point.

The broader picture for Spaces: what the mirror is protecting against, and how to structure it for off-site durability, is covered in backing up DigitalOcean Spaces.

Building your DigitalOcean backup verification pipeline

Individual verification scripts are useful. A pipeline that runs them automatically, logs results, and alerts on failure is what closes the loop.

The principle: run verification after the backup window closes, not just immediately after the backup command returns. A Spaces sync that's still in progress when you check will appear incomplete. A managed database backup that DigitalOcean is still writing will have the wrong size. Give the backup time to settle before you verify.

Verify on a schedule independent of the backup schedule, not just immediately after the backup runs. A 30-to-60 minute offset between the backup job and the verification job is a reasonable starting point.

Here's a cron-based pipeline that ties the pieces together. It runs the verification scripts, logs timestamped results to a file, and sends an alert via a webhook if anything fails.

#!/usr/bin/env bash
# verify-pipeline.sh
# Runs all three verification scripts and sends an alert if any fail.
# Set the following env vars before running:
#   ALERT_WEBHOOK_URL  — Slack/PagerDuty/etc. webhook to POST failures to
#   DROPLET_ID         — Droplet to verify snapshot for
#   DUMP_FILE          — Path to the latest pg_dump file
#   SOURCE_BUCKET      — Source Spaces bucket name
#   MIRROR_BUCKET      — Mirror Spaces bucket name
#   SPACES_ENDPOINT    — https://<region>.digitaloceanspaces.com

set -uo pipefail

LOG_DIR="${LOG_DIR:-/var/log/backup-verify}"
LOG_FILE="$LOG_DIR/$(date +%Y-%m-%d).log"
ALERT_WEBHOOK_URL="${ALERT_WEBHOOK_URL:-}"
FAILED=0

mkdir -p "$LOG_DIR"

log() {
  echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] $*" | tee -a "$LOG_FILE"
}

alert() {
  local message="$1"
  log "ALERT: $message"
  if [[ -n "$ALERT_WEBHOOK_URL" ]]; then
    curl -s -X POST "$ALERT_WEBHOOK_URL"
      -H 'Content-Type: application/json'
      -d "{\"text\": \"Backup verification failed: $message\"}"
      > /dev/null
  fi
}

run_check() {
  local name="$1"
  local script="$2"
  log "Running: $name"
  if output=$(bash "$script" 2>&1); then
    log "PASS [$name]: $output"
  else
    FAILED=1
    alert "$name: $output"
  fi
}

run_check "droplet-snapshot" "/opt/verify/verify-droplet-snapshot.sh"
run_check "pg-dump"          "/opt/verify/verify-pg-dump.sh"
run_check "spaces-mirror"    "/opt/verify/verify-spaces-mirror.sh"

if (( FAILED == 1 )); then
  log "Pipeline complete: ONE OR MORE CHECKS FAILED"
  exit 1
else
  log "Pipeline complete: all checks passed"
fi

Schedule it in cron. If your backup runs at 02:00 UTC, run verification at 04:00:

0 4 * * * DROPLET_ID=123456789 DUMP_FILE=/backups/latest.dump
  SOURCE_BUCKET=my-data MIRROR_BUCKET=my-data-mirror
  SPACES_ENDPOINT=https://nyc3.digitaloceanspaces.com
  ALERT_WEBHOOK_URL=https://hooks.slack.com/...
  bash /opt/verify/verify-pipeline.sh

Store the env vars in a dotfile rather than inline in the crontab for production use. The pattern above is for readability.

Verification methods by backup type

Backup typeWhat to verifyPrimary methodTooling
Droplet snapshotExists, status is available, size is non-zero, age within windowQuery snapshot list via doctldoctl compute snapshot list
Droplet volume snapshotSame as Droplet snapshotQuery volume snapshot list via doctldoctl compute volume-action get
Postgres dump (-Fc)File exists, size above minimum, table of contents parsespg_restore --list dry runpg_restore (no DB connection needed)
MySQL dump (plain SQL)File exists, size above minimum, header is valid mysqldumpHeader grep + size checkhead, stat
Spaces mirrorObject count within drift threshold, ETag spot-check on sampleObject list count comparison + ETagAWS CLI (s3api) against Spaces endpoint
Managed DB (native)Backup exists within retention windowManaged database API querydoctl databases backups list

The managed database row is worth calling out. DigitalOcean doesn't expose the actual dump files from native managed database backups, so you can't run pg_restore --list against them. The best you can do natively is confirm that a backup entry exists for the expected date. For actual dump-level verification, you need to produce the dump yourself (or let a tool like SimpleBackups do it) so you have a file to test against.

When verification fails: the decision tree

Verification catches the problem. It doesn't fix it. Here's how to think through a failure.

Step 1: Identify which check failed.

Each verification script exits non-zero with a specific message. Read the log. "Snapshot not found" is different from "snapshot found but size is 1 KB" is different from "pg_restore --list failed."

Step 2: Determine if it's a transient or structural failure.

A transient failure is something like "snapshot still pending" or "mirror sync in progress." Wait an hour and re-run. If the same check fails twice in a row, treat it as structural.

A structural failure means something went wrong with the backup itself. This is the scenario you were hoping to catch before you needed the data back.

Step 3: Check your backup window.

Is the most recent clean backup within your recovery point objective? If you have 7-day retention and the last clean backup was 5 days ago, you're still within window. If your last clean backup was 8 days ago, you have a gap.

Step 4: Trigger a manual backup immediately.

Don't wait for the next scheduled run. Take a snapshot or dump now, verify it manually, confirm you have at least one clean restore point.

Step 5: Investigate the root cause before re-enabling automation.

A backup that fails verification once might fail silently every time. Common causes:

  • Disk full on the backup server (dump truncated mid-write).
  • Network timeout during the Spaces upload (object written but 0 bytes).
  • Schema migration running during the dump window (dump structurally incomplete).
  • Snapshot quota reached (new snapshot not created, old one not rotated).

Fix the root cause. Re-run verification manually before re-enabling the schedule.

The off-site question

A verified backup inside your DigitalOcean account is better than an unverified one. But if the account is suspended, the billing method fails, or a region goes down, all of it goes away at once. Verification of your off-site copy is the only verification that matters when the account itself is the incident. See the off-site compliance guide linked earlier in this article for how to structure a defensible backup posture.

What to do tonight

Pick one backup type you're running today. Postgres dump, Droplet snapshot, Spaces mirror. Write the corresponding verification script from this article into your environment, run it manually against your last backup, and see if it passes.

If it passes, set up the cron schedule and move on. If it fails, you just found out before you needed the data back. That's exactly what verification is for.

If scripting and scheduling all this yourself sounds like a second job, SimpleBackups handles DigitalOcean Droplet, database, and Spaces backups off-site, with alerts when a run fails and restore testing built in. See how it works →

Keep learning

FAQ

How do I test if my DigitalOcean backup actually works?

For database dumps, run pg_restore --list (Postgres) or a header check (MySQL) against the file without connecting to a database. For Droplet snapshots, use doctl compute snapshot list to confirm the snapshot exists, is available, and is a non-trivial size. For Spaces mirrors, compare object counts between source and mirror buckets. A full restore to a staging environment is the most complete test, but the lightweight checks above catch the majority of failures without the overhead.

Can I automate backup restore testing?

Yes, though it takes more infrastructure. The pattern is: spin up a temporary Droplet or database cluster from the latest snapshot or dump, run a smoke test (a query that returns a known row count, or a quick application health check), then destroy the resource. This is sometimes called a "canary restore." It's the highest-confidence verification available, at the cost of compute time and complexity. For most teams, the lighter-weight checks in this article are a practical starting point.

How often should I verify my backups?

Verify every backup, automatically. The scripts above are fast enough to run daily with negligible overhead. The key rule: run verification on an independent schedule, not just immediately after the backup job. An offset of 30 to 60 minutes between the backup window closing and the verification job running gives slow syncs time to complete and avoids false negatives.

Does SimpleBackups include backup verification?

SimpleBackups runs verification checks on backups it manages and alerts you when a run fails or produces an unexpectedly small file. For managed databases, it produces actual dump files you can inspect and restore independently, unlike DigitalOcean's native backups which are opaque. You can also trigger on-demand restore tests from the dashboard.

What should I do if verification fails?

Don't wait for the next scheduled run. First, re-run the check to rule out a transient issue (sync lag, snapshot still pending). If the failure persists, treat it as a structural problem: take a manual backup immediately and verify it by hand, confirm you have at least one clean restore point within your recovery window, then investigate the root cause before re-enabling automation. The most common causes are disk full on the backup host, network timeouts during upload, and schema migrations running during the dump window.


This article is part of The complete guide to DigitalOcean backup, an honest, practical reference from the team that backs up DigitalOcean every day.