Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pg_dump --exclude-table pattern matching

The Problem

I use pg_dump to create a schema-only archive file of our database. I'd like to reduce the size of this file (and decrease the restore time) by omitting all but one of a set of partitioned tables in the dump.

For example, there are the following partitioned tables (by date) in the database. I only want to keep the last one.

awp_partition.awp_text_search_history_201209
awp_partition.awp_text_search_history_201210
awp_partition.awp_text_search_history_201211
awp_partition.awp_text_search_history_201212
plus hundreds more...

I created a pg_dump command (called inside a bash script) designed to exclude all of these tables except for the latest one using a negative lookahead regular expression:

pg_dump -h 11.111.11.11 -p 5432 -U username -F c -s \
-T 'awp_partition.awp_text_search_history_(?!201212)\d{6}' \
dbname > /home/me/tmp/prod3.backup

However, when I run this command, all of the tables are excluded from the dump file.

What I've tried

I tried using a combination of table include and exclude parameters, but attempting to exclude all tables while including one - that matched the pattern for the excludes - caused the entire dump to fail.

I tested my regex using Postgres regexp_matches() function and it correctly matched the tables I expected. However, based on the documentation: pg_dump documentation

Dump only tables (or views or sequences or foreign tables) matching table. Multiple tables can be selected by writing multiple -t switches. Also, the table parameter is interpreted as a pattern according to the same rules used by psql's \d commands (see Patterns), so multiple tables can also be selected by writing wildcard characters in the pattern. When using wildcards, be careful to quote the pattern if needed to prevent the shell from expanding the wildcards;

And the related documentation psql patterns documentation

Advanced users can use regular-expression notations such as character classes, for example [0-9] to match any digit. All regular expression special characters work as specified in Section 9.7.3, except for . which is taken as a separator as mentioned above, * which is translated to the regular-expression notation ., ? which is translated to ., and $ which is matched literally. You can emulate these pattern characters at need by writing ? for ., (R+|) for R, or (R|) for R?. $ is not needed as a regular-expression character since the pattern must match the whole name, unlike the usual interpretation of regular expressions (in other words, $ is automatically appended to your pattern). Write * at the beginning and/or end if you don't wish the pattern to be anchored. Note that within double quotes, all regular expression special characters lose their special meanings and are matched literally. Also, the regular expression special characters are matched literally in operator name patterns (i.e., the argument of \do).

I realize that the syntax of the negative lookahead operator may not be supported in this operation.

What to do?

It looks like I need to change my pattern matching strategy, and I'm struggling to think of a way to exclude all but one of those tables using psql \d patterns. Any ideas?

like image 932
EndlessLoop Avatar asked Jun 06 '14 18:06

EndlessLoop


People also ask

Is pg_dump consistent?

pg_dump is a utility for backing up a PostgreSQL database. It makes consistent backups even if the database is being used concurrently. pg_dump does not block other users accessing the database (readers or writers).

What is the difference between pg_dump and Pg_dumpall?

One caveat: pg_dump does not dump roles or other database objects including tablespaces, only a single database. To take backups on your entire PostgreSQL cluster, pg_dumpall is the better choice. pg_dumpall can handle the entire cluster, backing up information on roles, tablespaces, users, permissions, etc…

Does pg_dump affect performance?

The only impact of pg_dump are the increased I/O load and the long running transaction it creates. The long transaction will keep autovacuum from reclaimimg dead tuples for the duration of the dump. Normally that is no big problem unless you have very high write activity in the database.

Is pg_dump slow?

It seams that the pg_dump compression is rather slow if data is already compressed as it is with image data in a bytea format. And it is better to compress outside of pg_dump (-Z0). The time has dropped from ~70 minutes to ~5 minutes.


1 Answers

For pg_dump you need to pass schema and table in separated options, -n for schema and -t for a table. You need to run something like

pg_dump -h 11.111.11.11 -p 5432 -U username -F c -s \
-n awp_partition -T 'awp_text_search_history_(?!201212)\d{6}' \
dbname > /home/me/tmp/prod3.backup

It's a (relatively) old thread, but maybe someone in the future will have a similar problem.

like image 71
Nebojsa Susic Avatar answered Sep 25 '22 16:09

Nebojsa Susic