How do I fix a PostgreSQL 9.3 Slave that Cannot Keep Up with the Master?

Tags:

We have a master-slave replication configuration as follows.

On the master:

postgresql.conf has replication configured as follows (commented line taken out for brevity):

max_wal_senders = 1            
wal_keep_segments = 8

On the slave:

Same postgresql.conf as on the master. recovery.conf looks like this:

standby_mode = 'on'
primary_conninfo = 'host=master1 port=5432 user=replication password=replication'
trigger_file = '/tmp/postgresql.trigger.5432'

When this was initially setup, we performed some simple tests and confirmed the replication was working. However, when we did the initial data load, only some of the data made it to the slave.

Slave's log is now filled with messages that look like this:

< 2015-01-23 23:59:47.241 EST >LOG:  started streaming WAL from primary at F/52000000 on timeline 1
< 2015-01-23 23:59:47.241 EST >FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000000F00000052 has already been removed

< 2015-01-23 23:59:52.259 EST >LOG:  started streaming WAL from primary at F/52000000 on timeline 1
< 2015-01-23 23:59:52.260 EST >FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000000F00000052 has already been removed

< 2015-01-23 23:59:57.270 EST >LOG:  started streaming WAL from primary at F/52000000 on timeline 1
< 2015-01-23 23:59:57.270 EST >FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000000F00000052 has already been removed

After some analysis and help on the #postgresql IRC channel, I've come to the conclusion that the slave cannot keep up with the master. My proposed solution is as follows.

On the master:

Set max_wal_senders=5
Set wal_keep_segments=4000 . Yes I know it is very high, but I'd like to monitor the situation and see what happens. I have room on the master.

On the slave:

Save configuration files in the data directory (i.e. pg_hba.conf pg_ident.conf postgresql.conf recovery.conf)
Clear out the data directory (rm -rf /var/lib/pgsql/9.3/data/*) . This seems to be required by pg_basebackup.
Run the following command: pg_basebackup -h master -D /var/lib/pgsql/9.3/data --username=replication --password

Am I missing anything ? Is there a better way to bring the slave up-to-date w/o having to reload all the data ?

Any help is greatly appreciated.

658

asked Jan 28 '15 20:01

Oleg Dulin

1 Answers

The two important options for dealing with the WAL for streaming replication:

wal_keep_segments should be set high enough to allow a slave to catch up after a reasonable lag (e.g. high update volume, slave being offline, etc...).
archive_mode enables WAL archiving which can be used to recover files older than wal_keep_segments provides. The slave servers simply need a method to retrieve the WAL segments. NFS is the simplest method, but anything from scp to http to tapes will work so long as it can be scripted.
```
# on master
archive_mode = on
archive_command = 'cp %p /path_to/archive/%f' 

# on slave
restore_command = 'cp /path_to/archive/%f "%p"'
```
When the slave can't pull the WAL segment directly from the master, it will attempt to use the restore_command to load it. You can configure the slave to automatically remove segments using the archive_cleanup_commandsetting.

If the slave comes to a situation where the next WAL segment it needs is missing from both the master and the archive, there will be no way to consistently recover the database. The only reasonable option then is to scrub the server and start again from a fresh pg_basebackup.

115

answered Sep 28 '22 16:09

Ben Grimm

Related questions
                            
                                PgAdmin Error: Failed to decrypt the saved password
                            
                                How to write update function (stored procedure) in Postgresql?
                            
                                PostgreSQL - relation doesnt exist error when granting priviliges
                            
                                Read a Postgresql array directly into a Golang Slice
                            
                                Deleting a table in PostgreSQL without deleting an associated sequence
                            
                                Does not using NULL in PostgreSQL still use a NULL bitmap in the header?
                            
                                Get interval in milliseconds
                            
                                convert rows to string in postgresql
                            
                                Postgres append or set each elements(if not exists) of an array to an array column
                            
                                Update or create nested jsonb value using single update command
                            
                                How to know a timezone of a timestamp in postgresql 8.3
                            
                                What's the difference between B-Tree and GiST index methods (in PostgreSQL)?
                            
                                How to get column attributes query from table name using PostgreSQL?
                            
                                Difference between EXTRACT(year from timestamp) function and date_part('year', timestamp) in PostgreSQL
                            
                                PostgreSQL Trigger Error : control reached end of trigger procedure without RETURN
                            
                                AttributeError: 'UUID' object has no attribute 'replace' when using backend-agnostic GUID type
                            
                                ERROR: function dblink(unknown, unknown) does not exist
                            
                                How to set a nullable database field to NULL with typeorm?
                            
                                Django how to reconnect after DatabaseError: query timeout
                            
                                how to copy data from file to PostgreSQL using JDBC?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I fix a PostgreSQL 9.3 Slave that Cannot Keep Up with the Master?

Tags:

postgresql

redhat

replication

Oleg Dulin

People also ask

1 Answers

Ben Grimm

Recent Activity

Donate For Us