PostgreSQL pg_dump & pg

Backing up and restoring a PostgreSQL database is an essential task for any system administrator. Fortunately, there are built-in pg_dump and pg_restore utilities to make these tasks easier to complete.

PostgreSQL Backup & Restore Guide

Table of contents

Table of Contents
Introduction
The PostgreSQL pg_dump command
The PostgreSQL pg_restore command
The PostgreSQL pg_dumpall command
Conclusion
Automated PostgreSQL Backups as a service

Introduction

Using these utilities, administrators are able to create a full, incremental, or continuous backup either locally or remotely.

PostgreSQL is a relational database system that is widely used. The system is open-source and offers a wide array of tools to accomplish tasks quickly.

To help you understand more about these processes we will cover what they are and work through some examples.

It's important to note that running these commands assumes that you already have a server up and running the Linux operating system, and have PostgreSQL installed. There will also have to be a root password setup on your server.

The PostgreSQL pg_dump command

pg_dump

The pg_dump command extracts a PostgreSQL database into a script file or another archive file. This utility is for backing up databases. The utility makes consistent backups even if the database is being used concurrently. Readers, writers, and other users won't be blocked from using the database while using pg_dump.

Only a single database will be dumped. If there is a cluster of databases that need to be dumped, scroll down to the pg_dumpall command.

The output of pg_dump

The outputs are either script or archive files.

A script dump is a plain text file that contains SQL commands which can reconstruct the database to the exact state it was in when it was saved. Feeding the file to psql will restore these scripts. Script files work on other databases, and can be used with different machines and architectures.

A few modifications in the options even allows this file type to be used with other SQL database products.

Archive file formats aren't as universal and must be used by pg_restore to get the database back. While not as generic, this format allows the user to be selective of what gets restored. The user can even reorder items before restoring them, which makes this format ideal for porting the database across architectures.

Output format of pg_dump There is also flexibility when it comes to archiving and transferring information. Using the archive format with pg_dump and bringing it back in with pg_restore allows for more specific restoration. pg_dump will back up the entire database, then pg_restore will then select which parts of the database to archive or restore.

Using output file formats like "custom" and "directory" allows for the greatest flexibility. These formats support reordering archived items, parallel restoration, and automatically come compressed. If you want to run parallel dumps, then the "directory" format will be your only choice.

PostgreSQL statistics collector

The PostgreSQL statistics collector is a subsystem within PostgreSQL designed to collect and provide information about server activity. This includes a wide range of data, such as the number of rows fetched or affected by queries, the number of index scans, tables scans, the number of tuples inserted, updated, or deleted, and more.

If pg_dump has been run, be sure to take a look at the output of any standard errors that get printed. Running pg_dump will internally execute "select" statements. You will have to be able to select information from the database using psql for the operations to work properly. Any front-end library variables, like for libpq, will continue to be turned on when running the utility.

The statistics collector in PostgreSQL typically gathers information about the activities performed by pg_dump. However, if you prefer not to have this data collected, you can disable it by setting the track_counts parameter to false. This can be done either through the PGOPTIONS environment variable or by using the ALTER USER command.

pg_dump command structure

A standard command will follow this format:

pg_dump [connection-option…] [option…] [dbname]

Postgresql has online documentation that covers all of the options available for pg_dump, you'll find a few of the most common options below.

The pg_dump command in PostgreSQL offers a variety of options to control the content and format of the output when backing up a database. Here are some of the key options available:

Basic Options:
- dbname: Specifies the name of the database to be dumped.
- -a, --data-only: Dump only the data, not the schema.
- -b, --large-objects, --blobs (deprecated): Include large objects in the dump.
- -B, --no-large-objects, --no-blobs (deprecated): Exclude large objects in the dump.
- -c, --clean: Output commands to DROP all the dumped database objects before creating them.
Formatting and Filtering Options:
- -C, --create: Begin the output with a command to create the database itself.
- -e pattern, --extension=pattern: Dump only extensions matching the pattern.
- -E encoding, --encoding=encoding: Set the character set encoding for the dump.
- -F format, --format=format: Selects the format of the output (plain, custom, directory, tar).
- -j njobs, --jobs=njobs: Run the dump in parallel by dumping multiple tables simultaneously.
Schema and Table Selection:
- -n pattern, --schema=pattern: Dump only schemas matching the pattern.
- -N pattern, --exclude-schema=pattern: Exclude schemas matching the pattern.
- -t pattern, --table=pattern: Dump only tables with names matching the pattern.
- -T pattern, --exclude-table=pattern: Exclude tables matching the pattern.
Advanced and Specialized Options:
- --inserts, --column-inserts: Dump data as INSERT commands with explicit column names.
- --disable-triggers: Include commands to temporarily disable triggers on the target tables.
- --enable-row-security: Dump the contents of a table with row security enabled.
- --include-foreign-data=foreignserver: Dump the data for foreign tables with a matching foreign server.

Backing a single PostgreSQL database

Dumping a database called dangerousdb into an SQL file:

pg_dump dangerousdb > db.sql

Backing up the dangerousdb in with a tar format::

pg_dump -U postgres -F c dangerousdb > dangerousdb.tar

Saving the dangerousdb in a directory format runs this command:

pg_dump -U postgres -F d dangerousdb > db1_backup

Large databases that want smaller file formats may use the utility with a compression tool like gzip when running the backup.

pg_dump -U postgres dangerousdb | gzip > dangerousdb.gz

Reloading a script into a newly created database called nowdb:

psql -d nowdb -f db.sql

Dumping a single PostgreSQL table is possible by using the -t option and specifying the database and tab name (here: dangerousdb; tallytab)

pg_dump -t tallytab dangerousdb > db.sql

Dumping selected PostgreSQL tables by putting in conditions. Here the command will dump all tables that start with "sam" in the "ple" schema, but will exclude the table "samson."

pg_dump -t 'ple.sam*' -T ple.samson dangerousdb > db.sql

The PostgreSQL pg_restore command

pg_restore

The pg_restore command takes an archive file created by a pg_dump command and restores the selected PostgreSQL database.

When pg_dump is used with one of the non-plain text formats, the restore command will bring the database back. The utility knows how to issue commands in the proper order to make sure the database is reconstructed to the exact state it was in when the save occurred.

Since this format is supposed to be portable across architectures, the user may select what is restored and in what order.

The options for what can be done with the data depends on how the source file was generated. The command can't regenerate data that isn't there, and it can't change the nature of commands used to create the file.

The pg_restore command in PostgreSQL is used to restore databases from an archive created by pg_dump. Here are some of the most important options for this command:

Basic Options:
- filename: Specifies the location of the archive file to be restored.
- -a, --data-only: Restore only the data, not the schema.
- -c, --clean: Drop all objects before restoring them, useful for overwriting an existing database.
- -C, --create: Create the database before restoring into it.
- -d dbname, --dbname=dbname: Connect to a specific database and restore directly into it.
Performance Options:
- -j number-of-jobs, --jobs=number-of-jobs: Run the most time-consuming steps concurrently, reducing the time to restore a large database.
Filtering Options:
- -n schema, --schema=schema: Restore only objects in the specified schema.

pg_restore modes

There are two modes the utility uses. If the user specifies a database name, pg_restore connects to that database and restores content directly on it. If the name is left out, the process creates a script with the SQL commands to rebuild the database.

For the second choice, the result is a file with a standard output and will look similar to the standard text script that pg_dump would generate.

pg_restore command structure and format

The command will follow this format:

pg_restore [connection-option…] [option…] [filename]

For a full list of options, for these commands, you may check out the PostgreSQL documentation.

Restoring a backup with pg_restore

Restoring a backup with a .tar file name requires that the user consider whether the database already exists, and what the format of the backup is.

If the database dbcooper already exists, the following command will restore it:

pg_restore -U postgres -Ft -d dbcooper < dbcooper.tar

If the database doesn't yet exist, tweaking the command like the following will restore it:

pg_restore -U postgres -Ft -C -d dbcooper < dbcooper.tar

The following command will restore a backup from a backup file i.e. name: back_it_on_up.sql

psql -f back_it_on_up.sql

The PostgreSQL pg_dumpall command

pg_dumpall

The PostgreSQL utility extracts a database cluster into a script file

Using pg_dumpall, one command allows the user to back up an entire cluster of databases and dump them out into one script file. The file works the same as the pg_dump command, meaning that the script will use SQL commands to restore all databases.

In fact, this command will call pg_dump for each database in the cluster. Some parts of the architecture, like global objects, are saved too. Database roles, tablespaces, and any information that is common to all databases will be saved by pg_dumpall, something that pg_dump will never touch.

Requirements To effectively use the tool, you will likely have to be logged into the database as a superuser to get a complete dump. Superuser privileges will also be useful to execute the saved script so that you can add roles and create databases.

The final file will use the standard SQL script output. Running this utility will require connecting to the PostgreSQL server once per database while performing the dump. If you use password authentication, you will have to provide the password for each database in the cluster.

pg_dumpall structure and format

The command for the pg_dumpall utility will be structured like the following:

pg_dumpall [connection-option…] [option…]

PostgreSQL has extensive documentation that covers all options available for using the tool if you are looking for something specific.

Backup all databases with pg_dumpall

The following command will dump all databases:

pg_dumpall > db.out

This command will reload databases from file:

psql -f db.out postgres

This command will dump all files and create a single file called back_it_on_up.sql

pg_dumpall -f back_it_on_up.sql

pg_dumpall error messages

Many of the error messages that pop up will refer to pg_dump because the command runs this utility internally. Some errors will inevitably come up, but won't mean anything. The script will "create roles" for every role existing in the cluster you are using.

Roles like the bootstrap superuser, will likely get an error that says, "role already exists."

Using the --clean option with pg_dumpall

Databases will retain any previous contents and database-level properties. If you want to be sure that the databases are restored exactly as they are, using the --clean option may be useful.

The option authorized the script to recreate the built-in databases, and makes sure that each database will have the same properties they had previously in the cluster. Using this option will kick back some errors about non-existent objects, but these errors can be ignored.

Using the --if-exists option with pg_dumpall

Adding the --if-exists option will take those errors out if they are too distracting.

Running "analyze" on each database will give the optimizer useful statistics to determine how a restore went.

Conclusion

By using these utilities many of the backup features for a PostgreSQL can be accomplished with little effort.

The command that is right depends on whether the user wants to dump or restore a file and whether they want to back up everything or just a single database. Adding in the options that are available for the database system allows the user to customize how the backup happens and what the final result looks like.

Automated PostgreSQL Backups as a service

SimpleBackups is a database and website backup automation tool that offloads all the backups tasks. It automates PostgreSQL backups and sends them to the cloud storage of your choice. SimpleBackups helps you make sure your backups are running well, on time, securely, with all the bells and whistles.

→ Set up your first PostgreSQL backup for free

PostgreSQL pg_dump & pg_restore Guide

Table of Contents