Postgres: Checkpoints Are Occurring Too Frequently

Tags:

postgresql

We have a powerful Postgres server (64 cores, 384 GB RAM, 16 15k SAS drives, RAID 10), and several times during the day we rebuild several large datasets, which is very write intensive. Apache and Tomcat also run on the same server.

We're getting this warning about 300 times a day, while rebuilding these datasets, with long stretches where the errors are averaging 2 - 5 seconds apart:

2015-01-15 12:32:53 EST [11403]: [10841-1] LOG:  checkpoints are occurring too frequently (2 seconds apart)
2015-01-15 12:32:56 EST [11403]: [10845-1] LOG:  checkpoints are occurring too frequently (3 seconds apart)
2015-01-15 12:32:58 EST [11403]: [10849-1] LOG:  checkpoints are occurring too frequently (2 seconds apart)
2015-01-15 12:33:01 EST [11403]: [10853-1] LOG:  checkpoints are occurring too frequently (3 seconds apart)

These are the related settings:

checkpoint_completion_target    0.7
checkpoint_segments 64
checkpoint_timeout  5min
checkpoint_warning  30s
wal_block_size  8192
wal_buffers     4MB
wal_keep_segments   5000
wal_level   hot_standby
wal_receiver_status_interval    10s
wal_segment_size    16MB
wal_sync_method     fdatasync
wal_writer_delay    200ms
work_mem    96MB
shared_buffers  24GB
effective_cache_size    128GB

So that means we're writing 1024 MB worth of WAL files every 2 - 5 seconds, sometimes sustained for 15 - 30 minutes.

1) Do you see any settings we can improve on? Let me know if you need other settings documented.

2) Could we use "SET LOCAL synchronous_commit TO OFF;" at the beginning of these write-intensive transactions to let these WAL writes happen a bit more in the background, having less impact on the rest of the operations?

The data we're rebuilding is stored elsewhere, so on the off chance the power failed AND the RAID battery backup didn't do it's job, we're not out anything once the dataset gets rebuilt again.

Would "SET LOCAL synchronous_commit TO OFF;" cause any problems if this continues for 15 - 30 minutes? Or cause any problems with our streaming replication, which uses WAL senders?

Thanks!

PS. I'm hoping Samsung starts shipping their SM1715 3.2 TB PCIe enterprise SSD, since I think it would solve our problems nicely.

775

asked Jan 15 '15 20:01

user1517922

1 Answers

Your server is generating so much WAL data due to the wal_level set to hot_standby. I'm assuming you need this, so the best option to avoid the warnings is to increase your checkpoint_segments. But they are just that - warnings - it's quite common and perfectly normal to see them during bulk updates and data loads. You just happen to be updating frequently.

Changing synchronous_commit does not change what is written to the WAL, but rather the timing of when the commit returns to allow the OS to buffer those writes.

It may not apply to your schema, but you could potentially save some WAL data by using unlogged tables for your data rebuilds. Your replicas wouldn't have access to those tables, but after the rebuild you would be able to update your logged tables from their unlogged siblings.

150

answered Sep 21 '22 04:09

Ben Grimm

Related questions
                            
                                On AWS RDS Postgres, how to have dictionaries and unaccented full-text search?
                            
                                How to make a database listener with java?
                            
                                What is the form of my local postgresql database url?
                            
                                Docker-Compose Postgresql import dump
                            
                                What is postgresql_psycopg2?
                            
                                Spring Data @CreatedDate annotation doesn't work for me
                            
                                SET extra_float_digits = 3 in postgresql
                            
                                Mixing OCaml and C: is it worth the pain?
                            
                                PostgreSQL - Aliases column and HAVING
                            
                                Slow query on "UNION ALL" view
                            
                                Using PostGIS on Python 3
                            
                                Create a new postgres user programmatically, without interactive intervention
                            
                                Loopback neq: null
                            
                                Should I use Postgres's roles system for a web app's user management?
                            
                                How to query for empty array in JSONB?
                            
                                Calculate number of concurrent events in SQL
                            
                                Psycopg2 access PostgreSQL database on remote host without manually opening ssh tunnel
                            
                                How to check if TOAST is working on a particular table in postgres
                            
                                Repeatable pagination with randomly ordered rows
                            
                                Storing Golang JSON into Postgresql

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With