Some of my very rough notes when going nearly cover-to-cover of the PostgreSQL backup and restore documentation section. It’s one of the most detailed pieces of documentation I’ve ever read, so this acts as a very high-level summary. Most of the content is useful for general context surrounding data backups of any kind.
/data directory to signify the data storage directory. This
depends on the chosen configuration, however.
Three main types of backup strategies:
- SQL dump/load (stop-the-world)
- Continuous Archiving
The first two typically need lots of extra space on the database server to store the backup before you can upload it to some off-site storage.
“stop-the-world” in this context is not an official nomenclature. In these strategies, most likely, the database needs to be shut down at some point.
Continous archive-based backups are used for a leader-follower setup, or even delta backups—where there’s a base backup, and subsequent data as deltas that can be used to restore the entire database. Application-supported remote backups are quite simple, and so this is the best strategy if the database servers are space-constrained.
This is the easiest strategy.
pg_dump takes a backup, while
pg_restore command consumes the output of that backup. This strategy
is the simplest to cron-ify a backup, without external dependencies:
take a backup, upload the files to remote storage, test the backup on a
different machine, and do this every night.
pg_dumpsaves the database into a
.sqlstatement. Requires large enough space to hold both the database and the backup script.
File sizes might be limited by kernel/OS, so that’s something to look ahead while deciding to use this.
Restore from the
pg_dumpoutput might also need extra configuration tweaking around connection times: too less, and the database might close the connection before the entire script has run.
PostgreSQL’s directory layout is straight-forward—once you get to know
it. Most of the data is put in one directory, and this includes the two
main components needed for any future restores: the data files, and the
temporary append-only log files. If the database is shut down, you’re
free to copy over the
data directory to another machine, and start off
from it. Configuration files typically aren’t placed in the
directory, so they might need to be copied as well.
Any strategies that have to rely on the file-system layout of PostgreSQL, or features provided by the file system itself.
Two routes here: frozen snapshots of the file system, or using tools like rsync, tar etc.
If the underlying file system supports atomic volume snapshots (btrfs, zfs, Apple’s APFS for example), one can snapshot the entire
datadirectory. Lots of caveats around how good the snapshot mechanism is implemented exist.
The backup can be taken without stopping the server. During restore, this strategy would require replaying the logs as there might be some commits that weren’t turned to data files from the append only log.
tar, et al.
datadirectory. These utilities don’t take consistent snapshots of the disk, so it’s best to shutdown the server. Shutting down the server forces a full flush of the data to disk.
An example two step process with
run rsync shutdown the server rsync --checksum
What’s interesting is that this two-step process is similar to the one
used in online backups section. This is like a one-step delta backup
process if we stretch it enough: the first backup is a base backup that
contains the data committed till that point, then the second
takes the delta and copies that over.
This system can be used to setup a replicated system, consisting of a leader and potentially multiple followers. The data from the leader is pushed, and each of the followers might pull the data. Where this data is stored is customisable. There are many ways to setup replication in PostgreSQL, and the documentation for it is exhaustive. The archival part deals with the first part: taking the backup and pushing it somewhere.
This strategy piggy-backs on the fact that a WAL log may be used to
replay and restore a database. There are many caveats and configuration
tweaks to how long the WAL log files are retained, the size of those log
files and the naming of the files. It’s best to ship the log files as
and when they are created to an external storage service. Rather than do
this manually via
rsync et. al., PostgreSQL provides a way:
archive_command setting in the configuration, which takes a script.
WAL logs should be secured while transmission and remote storage, because these contain the actual data. (that goes for the main database too, fwiw)
archive_commandshould exit with
0code. Otherwise, the command gets retried. The
pg_waldirectory may potentially get filled, and cause the server to crash!
archive_commandshould be designed to ensure it doesn’t override existing files on the remote system.
Missing WAL logs from the archive might hamper future restore, so regular base backups will help keep the error surface area a little small.
old base backup + too many WAL logs to restore increase the restore time. It’s important to determine the maths behind this to figure out how much downtime you might need and tweak the base backup frequency, and WAL file size accordingly.
- One base backup as a starting point
- Continuous deltas in the form of the append-only log (WAL) files
The base backup marks the point where the backup would start (checkpoint).
Two ways to take a base backup:
pg_basebackupcommand from an external machine (or the same machine, with a different data directory setting), providing the connection info to connect to the leader.
Multiple commands can be run from multiple machines, but might depend on replication slots configuration on the leader.
Might use one/two connections depending on the variant of backup used: copy WAL logs at the end (1) or stream WAL logs parallelly (2).
Does not run if
/datadirectory is not empty.
Two-step process via
rsync. PostgreSQL provides two SQL statements for signalling the server that the user is taking a backup, and that a checkpoint has to be created:
SELECT pg_start_backup('some_label') rsync /data SELECT * from pg_stop_backup();
Data dumps taken with
pg_dump or the file system strategy mentioned
above can be restored by
pg_restore or just starting the server.
Needless to say, this strategy causes either data loss or needs
downtime, depending on the operations chosen.
If a system has a simple
pg_dumpcron job that ships the archive to remote storage, when the leader crashes or dies, the time to detection, copying the archive to the follower,
pg_restorecompletion times is the amount of downtime that’s required.
The cron job, if configured at a certain time in the day, differs from the time the crash happens, the delta in the data until that time on the leader is a potential loss in data.
When the leader crashes/dies, but you still have access to the physical data disks, recovery using file system snapshot is possible, and that may potentially recover all the data up till the point of the last commit. Because this recovery would also have the WAL files handy, the replay will make sure as much data as possible is recovered.
Continuous Archive restores
If the system is setup with continuous archiving, it may be possible to recover all the data. Restore times depend on how fast the base backup archive, WAL logs can be copied over to the new server, and the WAL log replay.
There are many ways to do this, too, depending on the underlying infra: shared disk (two machines accessing the same disk), file system replication (a write to one drive is mirrored to a different machine atomically), side-car middlewares that execute a given statement on multiple machines simultaneously, or even application-level middlewares that do this. Streaming/Point-in-time replication is one preferred approach that can piggy back on the continuous archive backup strategy.
Streaming/Point-in-time replication strategy uses wal logs shipped to a
remote server using
archive_command from the leader to be used to
replay the logs on a follower continously.
Note that in streaming replication is possible without using
archive_command, provided the data ingestion throughput never exceeds the rate of the follower streaming the logs directly from the leader, and applying them locally (also depends on the network latency).
If the follower is not able to keep up with the logs, the logs on leader might get recycled, and the follower will keep waiting for the now-non-existent WAL file. Force-starting the follower in case of failure will result in data loss.