Some of my very rough notes when going nearly cover-to-cover of the PostgreSQL backup and restore documentation section. It’s one of the most detailed pieces of documentation I’ve ever read, so this acts as a very high-level summary. Most of the content is useful for general context surrounding data backups of any kind.
I’m using /data
directory to signify the data storage directory. This depends on the chosen configuration, however.
Backups
Three main types of backup strategies:
- SQL dump/load (stop-the-world)
- Backup
/data
directory (stop-the-world) - Continuous Archiving
The first two typically need lots of extra space on the database server to store the backup before you can upload it to some off-site storage.
“stop-the-world” in this context is not an official nomenclature. In these strategies, most likely, the database needs to be shut down at some point.
Continous archive-based backups are used for a leader-follower setup, or even delta backups—where there’s a base backup, and subsequent data as deltas that can be used to restore the entire database. Application-supported remote backups are quite simple, and so this is the best strategy if the database servers are space-constrained.
SQL Dump/Load
This is the easiest strategy. pg_dump
takes a backup, while pg_restore
command consumes the output of that backup. This strategy is the simplest to cron-ify a backup, without external dependencies: take a backup, upload the files to remote storage, test the backup on a different machine, and do this every night.
-
pg_dump
saves the database into a.sql
statement. Requires large enough space to hold both the database and the backup script. -
File sizes might be limited by kernel/OS, so that’s something to look ahead while deciding to use this.
-
Restore from the
pg_dump
output might also need extra configuration tweaking around connection times: too less, and the database might close the connection before the entire script has run.
Copy /data
directory
PostgreSQL’s directory layout is straight-forward—once you get to know it. Most of the data is put in one directory, and this includes the two main components needed for any future restores: the data files, and the temporary append-only log files. If the database is shut down, you’re free to copy over the data
directory to another machine, and start off from it. Configuration files typically aren’t placed in the data
directory, so they might need to be copied as well.
Any strategies that have to rely on the file-system layout of PostgreSQL, or features provided by the file system itself.
Two routes here: frozen snapshots of the file system, or using tools like rsync, tar etc.
frozen snapshots
-
If the underlying file system supports atomic volume snapshots (btrfs, zfs, Apple’s APFS for example), one can snapshot the entire
data
directory. Lots of caveats around how good the snapshot mechanism is implemented exist. -
The backup can be taken without stopping the server. During restore, this strategy would require replaying the logs as there might be some commits that weren’t turned to data files from the append only log.
rsync
, tar
, et al.
-
rsync
,gzip
,tar
thedata
directory. These utilities don’t take consistent snapshots of the disk, so it’s best to shutdown the server. Shutting down the server forces a full flush of the data to disk. -
An example two step process with
rsync
:
run rsync shutdown the server rsync --checksum
What’s interesting is that this two-step process is similar to the one used in online backups section. This is like a one-step delta backup process if we stretch it enough: the first backup is a base backup that contains the data committed till that point, then the second rsync
takes the delta and copies that over.
Continuous Archiving
This system can be used to setup a replicated system, consisting of a leader and potentially multiple followers. The data from the leader is pushed, and each of the followers might pull the data. Where this data is stored is customisable. There are many ways to setup replication in PostgreSQL, and the documentation for it is exhaustive. The archival part deals with the first part: taking the backup and pushing it somewhere.
This strategy piggy-backs on the fact that a WAL log may be used to replay and restore a database. There are many caveats and configuration tweaks to how long the WAL log files are retained, the size of those log files and the naming of the files. It’s best to ship the log files as and when they are created to an external storage service. Rather than do this manually via rsync
et. al., PostgreSQL provides a way: archive_command
setting in the configuration, which takes a script.
-
WAL logs should be secured while transmission and remote storage, because these contain the actual data. (that goes for the main database too, fwiw)
-
archive_command
should exit with0
code. Otherwise, the command gets retried. Thepg_wal
directory may potentially get filled, and cause the server to crash! -
archive_command
should be designed to ensure it doesn’t override existing files on the remote system. -
Missing WAL logs from the archive might hamper future restore, so regular base backups will help keep the error surface area a little small.
-
old base backup + too many WAL logs to restore increase the restore time. It’s important to determine the maths behind this to figure out how much downtime you might need and tweak the base backup frequency, and WAL file size accordingly.
General mechanism:
- One base backup as a starting point
- Continuous deltas in the form of the append-only log (WAL) files
The base backup marks the point where the backup would start (checkpoint).
base backup
Two ways to take a base backup:
-
Use the
pg_basebackup
command from an external machine (or the same machine, with a different data directory setting), providing the connection info to connect to the leader.-
Multiple commands can be run from multiple machines, but might depend on replication slots configuration on the leader.
-
Might use one/two connections depending on the variant of backup used: copy WAL logs at the end (1) or stream WAL logs parallelly (2).
-
Does not run if
/data
directory is not empty.
-
-
Two-step process via
rsync
. PostgreSQL provides two SQL statements for signalling the server that the user is taking a backup, and that a checkpoint has to be created:pg_start_backup
,pg_stop_backup
.
SELECT pg_start_backup('some_label') rsync /data SELECT * from pg_stop_backup();
Restore
stop-the-world restores
Data dumps taken with pg_dump
or the file system strategy mentioned above can be restored by pg_restore
or just starting the server. Needless to say, this strategy causes either data loss or needs downtime, depending on the operations chosen.
-
If a system has a simple
pg_dump
cron job that ships the archive to remote storage, when the leader crashes or dies, the time to detection, copying the archive to the follower,pg_restore
completion times is the amount of downtime that’s required. -
The cron job, if configured at a certain time in the day, differs from the time the crash happens, the delta in the data until that time on the leader is a potential loss in data.
-
When the leader crashes/dies, but you still have access to the physical data disks, recovery using file system snapshot is possible, and that may potentially recover all the data up till the point of the last commit. Because this recovery would also have the WAL files handy, the replay will make sure as much data as possible is recovered.
Continuous Archive restores
If the system is setup with continuous archiving, it may be possible to recover all the data. Restore times depend on how fast the base backup archive, WAL logs can be copied over to the new server, and the WAL log replay.
Replication
There are many ways to do this, too, depending on the underlying infra: shared disk (two machines accessing the same disk), file system replication (a write to one drive is mirrored to a different machine atomically), side-car middlewares that execute a given statement on multiple machines simultaneously, or even application-level middlewares that do this. Streaming/Point-in-time replication is one preferred approach that can piggy back on the continuous archive backup strategy.
Streaming/Point-in-time replication strategy uses wal logs shipped to a remote server using archive_command
from the leader to be used to replay the logs on a follower continously.
-
Note that in streaming replication is possible without using
archive_command
, provided the data ingestion throughput never exceeds the rate of the follower streaming the logs directly from the leader, and applying them locally (also depends on the network latency).If the follower is not able to keep up with the logs, the logs on leader might get recycled, and the follower will keep waiting for the now-non-existent WAL file. Force-starting the follower in case of failure will result in data loss.