Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A faster way to copy a postgresql database (or the best way)

Tags:

postgresql

I did a pg_dump of a database and am now trying to install the resulting .sql file on to another server.

I'm using the following command.

psql -f databasedump.sql 

I initiated the database install earlier today and now 7 hours later the database is still being populated. I don't know if this his how long it is supposed to take, but I continue to monitor it, so far I've seen over 12 millon inserts and counting. I suspect there's a faster way to do this.

like image 234
David Bain Avatar asked Mar 28 '13 21:03

David Bain


People also ask

What is the best way to transfer the data in a PostgreSQL?

If you really have two distinct PostgreSQL databases, the common way of transferring data from one to another would be to export your tables (with pg_dump -t ) to a file, and import them into the other database (with psql ).

Why does pg_dump take so long?

It seams that the pg_dump compression is rather slow if data is already compressed as it is with image data in a bytea format. And it is better to compress outside of pg_dump (-Z0). The time has dropped from ~70 minutes to ~5 minutes.


2 Answers

Create your dumps with

pg_dump -Fc -Z 9  --file=file.dump myDb 

Fc

Output a custom archive suitable for input into pg_restore. This is the most flexible format in that it allows reordering of loading data as well as object definitions. This format is also compressed by default.

Z 9: --compress=0..9

Specify the compression level to use. Zero means no compression. For the custom archive format, this specifies compression of individual table-data segments, and the default is to compress at a moderate level. For plain text output, setting a nonzero compression level causes the entire output file to be compressed, as though it had been fed through gzip; but the default is not to compress. The tar archive format currently does not support compression at all.

and restore it with

pg_restore -Fc -j 8  file.dump 

-j: --jobs=number-of-jobs

Run the most time-consuming parts of pg_restore — those which load data, create indexes, or create constraints — using multiple concurrent jobs. This option can dramatically reduce the time to restore a large database to a server running on a multiprocessor machine.

Each job is one process or one thread, depending on the operating system, and uses a separate connection to the server.

The optimal value for this option depends on the hardware setup of the server, of the client, and of the network. Factors include the number of CPU cores and the disk setup. A good place to start is the number of CPU cores on the server, but values larger than that can also lead to faster restore times in many cases. Of course, values that are too high will lead to decreased performance because of thrashing.

Only the custom and directory archive formats are supported with this option. The input must be a regular file or directory (not, for example, a pipe). This option is ignored when emitting a script rather than connecting directly to a database server. Also, multiple jobs cannot be used together with the option --single-transaction.

Links:

pg_dump

pg_restore

like image 197
mullerivan Avatar answered Oct 15 '22 04:10

mullerivan


Improve pg dump&restore

PG_DUMP | always use format directory with -j option

time pg_dump -j 8 -Fd -f /tmp/newout.dir fsdcm_external 

PG_RESTORE | always use tuning for postgres.conf with format directory With -j option

work_mem = 32MB shared_buffers = 4GB maintenance_work_mem = 2GB full_page_writes = off autovacuum = off wal_buffers = -1  time pg_restore -j 8 --format=d -C -d postgres /tmp/newout.dir/` 

For more info

https://gitlab.com/yanar/Tuning/wikis/improve-pg-dump&restore

like image 41
Yanar Assaf Avatar answered Oct 15 '22 05:10

Yanar Assaf