Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

verifying data consistency between two postgresql databases

This is specifically about maintaining confidence in using various replication solutions that you'd be able to failover to the other server without data loss. Or in a master-master situation that you could know within a reasonable amount of time if one of the databases has fallen out of sync.

Are there any tools out there for this, or do people generally depend on the replication system itself to warn over inconsistencies? I'm currently most familiar with postgresql WAL shipping in a master-standby setup, but am considering a master-master setup with something like PgPool. However, as that solution is a little less directly tied with PostgreSQL itself (my basic understanding is that it provides the connection an app would use, thus intercepting the various SQL statements, and would then send them on to whatever servers were in its pool), it got me thinking more about actually verifying data consistency.

Specific requirements:

  1. I'm not talking about just table structure. I'd want to know that actual record data is the same, so that I'd know if records were corrupted or missed (in which case, I would re-initialize the bad database with a recent backup + WAL files before bringing it back into the pool)

  2. Databases are in the order of 30-50 GB. I'm doubting that raw SELECT queries would work very well.

  3. I don't see the need for real-time checking (though it would, of course, be nice). Hourly or even daily would be better than nothing.

  4. Block-level checking wouldn't work. It would be two databases with independent storage.

Or is this type of verification simply not realistic?

like image 981
David Ackerman Avatar asked May 14 '13 18:05

David Ackerman


People also ask

How do I check consistency in PostgreSQL?

There is no built-in consistency check command or tool in PostgreSQL. The general view is that one should not be necessary, as corruption and inconsistency should not be possible on a quality hardware/software stack.

How do you ensure data consistency in a database?

Database consistency is achieved by establishing rules. Any transaction of data written to the database must only change affected data as defined by the specific constraints, triggers, variables, cascades, etc., established by the rules set by the database's developer.

How we can check data consistency?

One way to check data consistency is to use the CHECK DATA, CHECK INDEX, and CHECK LOB online utilities. You might want to use these utilities when you do a conditional restart or a point-in-time recovery or repair data.


1 Answers

You can check the current WAL locations on both the machines... If they represent the same value, that means your underlying databases are consistent with each other...

$ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (do it on primary host)
 pg_current_xlog_location 
--------------------------
 0/2000000
(1 row)

$ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (do it on standby host)
 pg_last_xlog_receive_location 
-------------------------------
 0/2000000
(1 row)

$ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (do it on  standby host)
 pg_last_xlog_replay_location 
------------------------------
 0/2000000
(1 row)

you can also check this with the help of walsender and walreceiver processes:

[do it on  primary] $ ps -ef | grep sender
postgres  6879  6831  0 10:31 ?        00:00:00 postgres: wal sender process postgres 127.0.0.1(44663) streaming 0/2000000

[ do it on standby] $ ps -ef | grep receiver
postgres  6878  6872  1 10:31 ?        00:00:01 postgres: wal receiver process   streaming 0/2000000
like image 151
Samurai Avatar answered Sep 27 '22 20:09

Samurai