I am creating to pg_dumps, DUMP1 and DUMP2. DUMP1 and DUMP2 are exactly the same, except DUMP2 was dumped in REVERSE order of DUMP1. Is there anyway that I can sort the two DUMPS so that the two DUMP files are exactly the same (when using a diff)? I am using PHP and linux. I tried using "sort" in linux, but that does not work... Thanks!

From your previous question, I assume that what you are really trying to do is compare to databases to see if they are they same including the data. As we saw there, pg_dump is not going to behave deterministically. The fact that one file is the reverse of the other is probably just coincidental. Here is a way that you can do the total comparison including schema and data. First, compare the schema using this method. Second, compare the data by dumping it all to a file in an order that will be consistent. Order is guaranteed by first sorting the tables by name and then by sorting the data within each table by primary key column(s). The query below generates the <code>COPY</code> statements. <pre class="prettyprint"><code>select 'copy (select * from '||r.relname||' order by '|| array_to_string(array_agg(a.attname), ',')|| ') to STDOUT;' from pg_class r, pg_constraint c, pg_attribute a where r.oid = c.conrelid and r.oid = a.attrelid and a.attnum = ANY(conkey) and contype = 'p' and relkind = 'r' group by r.relname order by r.relname </code></pre> Running that query will give you a list of statements like <code>copy (select * from test order by a,b) to STDOUT;</code> Put those all in a text file and run them through psql for each database and then compare the output files. You may need to tweak with the output settings to <code>COPY</code>.

My solution was to code an own program for the pg_dump output. Feel free to download PgDumpSort which sorts the dump by primary key. With the java default memory of 512MB it should work with up to 10 million records per table, since the record info (primary key value, file offsets) are held in memory. You use this little Java program e.g. with <pre class="prettyprint"><code>java -cp ./pgdumpsort.jar PgDumpSort db.sql </code></pre> And you get a file named "db-sorted.sql", or specify the output file name: <pre class="prettyprint"><code>java -cp ./pgdumpsort.jar PgDumpSort db.sql db-$(date +%F).sql </code></pre> And the sorted data is in a file like "db-2013-06-06.sql" Now you can create patches using diff <pre class="prettyprint"><code>diff --speed-large-files -uN db-2013-06-05.sql db-2013-06-06.sql >db-0506.diff </code></pre> This allows you to create incremental backup which are usually way smaller. To restore the files you have to apply the patch to the original file using <pre class="prettyprint"><code> patch -p1 < db-0506.diff </code></pre> (Source code is inside of the JAR file)

Sorting postgresql database dump (pg_dump)

2 Answers

From your previous question, I assume that what you are really trying to do is compare to databases to see if they are they same including the data.

As we saw there, pg_dump is not going to behave deterministically. The fact that one file is the reverse of the other is probably just coincidental.

Here is a way that you can do the total comparison including schema and data.

First, compare the schema using this method.

Second, compare the data by dumping it all to a file in an order that will be consistent. Order is guaranteed by first sorting the tables by name and then by sorting the data within each table by primary key column(s).

The query below generates the COPY statements.

select
    'copy (select * from '||r.relname||' order by '||
    array_to_string(array_agg(a.attname), ',')||
    ') to STDOUT;'
from
    pg_class r,
    pg_constraint c,
    pg_attribute a
where
    r.oid = c.conrelid
    and r.oid = a.attrelid
    and a.attnum = ANY(conkey)
    and contype = 'p'
    and relkind = 'r'
group by
    r.relname
order by
    r.relname

Running that query will give you a list of statements like copy (select * from test order by a,b) to STDOUT; Put those all in a text file and run them through psql for each database and then compare the output files. You may need to tweak with the output settings to COPY.

109

answered Oct 19 '22 19:10

cope360

My solution was to code an own program for the pg_dump output. Feel free to download PgDumpSort which sorts the dump by primary key. With the java default memory of 512MB it should work with up to 10 million records per table, since the record info (primary key value, file offsets) are held in memory.

You use this little Java program e.g. with

java -cp ./pgdumpsort.jar PgDumpSort db.sql

And you get a file named "db-sorted.sql", or specify the output file name:

java -cp ./pgdumpsort.jar PgDumpSort db.sql db-$(date +%F).sql

And the sorted data is in a file like "db-2013-06-06.sql"

Now you can create patches using diff

diff --speed-large-files -uN db-2013-06-05.sql db-2013-06-06.sql >db-0506.diff

This allows you to create incremental backup which are usually way smaller. To restore the files you have to apply the patch to the original file using

 patch -p1 < db-0506.diff

(Source code is inside of the JAR file)

answered Oct 19 '22 19:10

bebbo

Related questions
                            
                                Performance in PDO / PHP / MySQL: transaction versus direct execution
                            
                                Large .PDF Files Not Uploading To MySQL Database as Medium BLOB Via PHP, Files under 2MB Work Fine
                            
                                How To ? Form Post to Multiple Locations
                            
                                PHP: word definition script? [closed]
                            
                                Best Practices for creating a PHP INI/CONFIG file and keep it secure
                            
                                How to enable imagecreatefromgif/imagecreatefromjpeg/imagecreatefrompng in PHP?
                            
                                Have some RewriteCond's affect multiple rules
                            
                                Utilities file in php?
                            
                                keep user logged in when he visit the same page again?
                            
                                ServiceLocator and the Open/Closed Principle
                            
                                An efficient way to save an Array and its Keys to a database
                            
                                Preg_Replace and UTF8
                            
                                try to open a page every 10 seconds
                            
                                Not finding elements using getElementsByTagName() using DOMDocument
                            
                                oAuth with PHP (for google api)
                            
                                How to create array-like data-structures with object keys in PHP?
                            
                                Autotest equivalent for PHP?
                            
                                How Does the Zend Application / Bootstrapping Work?
                            
                                Stopping people from hijacking a voting system using PHP?
                            
                                Read user home directory from PHP

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sorting postgresql database dump (pg_dump)

Tags:

linux

php

sorting

postgresql

pg-dump

littleK

People also ask

2 Answers

cope360

bebbo

Recent Activity

Donate For Us