(Almost) Perfect DB Backups

Table of Contents

First, let’s assume there is no budget for replicas so this is a single node MariaDB instance. Otherwise the answer would be very different!

In this scenario the perfect way to back up a huge MariaDB datadir is leveraging filesystem snapshots. An example of this approach is lvm (or zfs equivalent tooling) - a pipeline with lvcreate is near instant and atomic at the block level. Incorporate fsfreeze for a flush to journal thereby avoiding any possibility for maintenance recovery in the event of a restore.

What if that is not an option though? The alternative everyone knows is mariadb-dump but it means complex databases of over 100 Gigabytes might start to take hours to process.

Below is my script which uses mariabackup, which is substantially better than mariadb-dump. As a bonus, I compress cheaply with lz4 directly piped to s3. A lot of validation for safety is included because it does require a --target-dir which necessitates cleanup:

BASH

#!/bin/bash
set -eo pipefail

if ! command -v mktemp >/dev/null 2>&1; then
  echo "mktemp command required" >&2
  exit 1
fi

TMP=$(mktemp -d -t mariabackup-XXXXXX)

cleanup() {
  if [[ "$TMP" == /tmp/mariabackup-* ]]; then
    rm -rf "$TMP"
  else
    echo "invalid tmp dir: $TMP skipping cleanup" >&2
  fi
}

trap cleanup EXIT

S3="s3://bucket/backup-$(date +%F_%H-%M-%S).xbstream.lz4"

echo "starting backup: $S3 tmp dir: $TMP"

mariabackup \
  --backup \
  --stream=xbstream \
  --target-dir="$TMP" \
| lz4 -1 \
| aws s3 cp - "$S3"

echo "backup complete: $S3"

This cannot reasonably be compared to near instant volume snapshot backups but it is over 3x faster than mariadb-dump in my experience.

Thanks for reading!