verifying data consistency between two postgresql databases

Tags:

This is specifically about maintaining confidence in using various replication solutions that you'd be able to failover to the other server without data loss. Or in a master-master situation that you could know within a reasonable amount of time if one of the databases has fallen out of sync.

Are there any tools out there for this, or do people generally depend on the replication system itself to warn over inconsistencies? I'm currently most familiar with postgresql WAL shipping in a master-standby setup, but am considering a master-master setup with something like PgPool. However, as that solution is a little less directly tied with PostgreSQL itself (my basic understanding is that it provides the connection an app would use, thus intercepting the various SQL statements, and would then send them on to whatever servers were in its pool), it got me thinking more about actually verifying data consistency.

Specific requirements:

I'm not talking about just table structure. I'd want to know that actual record data is the same, so that I'd know if records were corrupted or missed (in which case, I would re-initialize the bad database with a recent backup + WAL files before bringing it back into the pool)
Databases are in the order of 30-50 GB. I'm doubting that raw SELECT queries would work very well.
I don't see the need for real-time checking (though it would, of course, be nice). Hourly or even daily would be better than nothing.
Block-level checking wouldn't work. It would be two databases with independent storage.

Or is this type of verification simply not realistic?

981

asked May 14 '13 18:05

David Ackerman

1 Answers

You can check the current WAL locations on both the machines... If they represent the same value, that means your underlying databases are consistent with each other...

$ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (do it on primary host)
 pg_current_xlog_location 
--------------------------
 0/2000000
(1 row)

$ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (do it on standby host)
 pg_last_xlog_receive_location 
-------------------------------
 0/2000000
(1 row)

$ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (do it on  standby host)
 pg_last_xlog_replay_location 
------------------------------
 0/2000000
(1 row)

you can also check this with the help of walsender and walreceiver processes:

[do it on  primary] $ ps -ef | grep sender
postgres  6879  6831  0 10:31 ?        00:00:00 postgres: wal sender process postgres 127.0.0.1(44663) streaming 0/2000000

[ do it on standby] $ ps -ef | grep receiver
postgres  6878  6872  1 10:31 ?        00:00:01 postgres: wal receiver process   streaming 0/2000000

151

answered Sep 27 '22 20:09

Samurai

Related questions
                            
                                Why does vacuum analyze change query plan while analyze does not?
                            
                                Fresh-installation laravel project can't access postgresql database but can do php migrate
                            
                                sp_send_dbmail alternative in postgres? Easy way to send Postgres email reports?
                            
                                Pandas dataframe to PostgreSQL table using psycopg2 without SQLAlchemy?
                            
                                Is it OK to specify a schema in `table_name_prefix`?
                            
                                Connection was closed in the middle of operation when accesing database using Python
                            
                                "NULL identity key" error using SQLAlchemy's base automap to reflect a postgres DB using IDENTITY columns
                            
                                Enforce referential integrity in a ternary relation
                            
                                What is the equivalent of timestamp/rowversion (SQL Server) with PostgreSQL
                            
                                With sqlalchemy how to dynamically bind to database engine on a per-request basis
                            
                                GeoDjango setup: ERROR: could not access file "$libdir/postgis-1.5": No such file or directory
                            
                                How to create a PostgreSQL partitioned sequence?
                            
                                Anything similar to MySQL Proxy for PostgreSQL? [closed]
                            
                                Intervals: How can I make sure there is just one row with a null value in a timstamp column in table?
                            
                                How fix double encoding in PostgreSQL?
                            
                                Storing C# datetime to postgresql TimeStamp
                            
                                How to make attribute setter send value through SQL function
                            
                                Postgresql order by and limit
                            
                                Merging two version-tracking tables while filling in values
                            
                                SQLALchemy DB Session with Flask, Postgres

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

verifying data consistency between two postgresql databases

Tags:

postgresql

replication

data-consistency

David Ackerman

People also ask

1 Answers

Samurai

Recent Activity

Donate For Us