I am creating to pg_dumps, DUMP1 and DUMP2.
DUMP1 and DUMP2 are exactly the same, except DUMP2 was dumped in REVERSE order of DUMP1.
Is there anyway that I can sort the two DUMPS so that the two DUMP files are exactly the same (when using a diff)?
I am using PHP and linux. I tried using "sort" in linux, but that does not work...
Thanks!
The only impact of pg_dump are the increased I/O load and the long running transaction it creates. The long transaction will keep autovacuum from reclaimimg dead tuples for the duration of the dump. Normally that is no big problem unless you have very high write activity in the database.
One caveat: pg_dump does not dump roles or other database objects including tablespaces, only a single database. To take backups on your entire PostgreSQL cluster, pg_dumpall is the better choice. pg_dumpall can handle the entire cluster, backing up information on roles, tablespaces, users, permissions, etc…
c:\Program files\postgresql\9.3\bin> pg_dump -h localhost -p 5432 -U postgres test > D:\backup. sql ... After above command enter User "postgres" password and check D:\ drive for backup.sql file. Follow this answer to receive notifications.
Restoring the data from pg_dump doesn't overwrite the data but it appends the data to the original database. Bookmark this question.
From your previous question, I assume that what you are really trying to do is compare to databases to see if they are they same including the data.
As we saw there, pg_dump is not going to behave deterministically. The fact that one file is the reverse of the other is probably just coincidental.
Here is a way that you can do the total comparison including schema and data.
First, compare the schema using this method.
Second, compare the data by dumping it all to a file in an order that will be consistent. Order is guaranteed by first sorting the tables by name and then by sorting the data within each table by primary key column(s).
The query below generates the COPY
statements.
select
'copy (select * from '||r.relname||' order by '||
array_to_string(array_agg(a.attname), ',')||
') to STDOUT;'
from
pg_class r,
pg_constraint c,
pg_attribute a
where
r.oid = c.conrelid
and r.oid = a.attrelid
and a.attnum = ANY(conkey)
and contype = 'p'
and relkind = 'r'
group by
r.relname
order by
r.relname
Running that query will give you a list of statements like copy (select * from test order by a,b) to STDOUT;
Put those all in a text file and run them through psql for each database and then compare the output files. You may need to tweak with the output settings to COPY
.
My solution was to code an own program for the pg_dump output. Feel free to download PgDumpSort which sorts the dump by primary key. With the java default memory of 512MB it should work with up to 10 million records per table, since the record info (primary key value, file offsets) are held in memory.
You use this little Java program e.g. with
java -cp ./pgdumpsort.jar PgDumpSort db.sql
And you get a file named "db-sorted.sql", or specify the output file name:
java -cp ./pgdumpsort.jar PgDumpSort db.sql db-$(date +%F).sql
And the sorted data is in a file like "db-2013-06-06.sql"
Now you can create patches using diff
diff --speed-large-files -uN db-2013-06-05.sql db-2013-06-06.sql >db-0506.diff
This allows you to create incremental backup which are usually way smaller. To restore the files you have to apply the patch to the original file using
patch -p1 < db-0506.diff
(Source code is inside of the JAR file)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With