Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres: Checkpoints Are Occurring Too Frequently

Tags:

postgresql

We have a powerful Postgres server (64 cores, 384 GB RAM, 16 15k SAS drives, RAID 10), and several times during the day we rebuild several large datasets, which is very write intensive. Apache and Tomcat also run on the same server.

We're getting this warning about 300 times a day, while rebuilding these datasets, with long stretches where the errors are averaging 2 - 5 seconds apart:

2015-01-15 12:32:53 EST [11403]: [10841-1] LOG:  checkpoints are occurring too frequently (2 seconds apart)
2015-01-15 12:32:56 EST [11403]: [10845-1] LOG:  checkpoints are occurring too frequently (3 seconds apart)
2015-01-15 12:32:58 EST [11403]: [10849-1] LOG:  checkpoints are occurring too frequently (2 seconds apart)
2015-01-15 12:33:01 EST [11403]: [10853-1] LOG:  checkpoints are occurring too frequently (3 seconds apart)

These are the related settings:

checkpoint_completion_target    0.7
checkpoint_segments 64
checkpoint_timeout  5min
checkpoint_warning  30s
wal_block_size  8192
wal_buffers     4MB
wal_keep_segments   5000
wal_level   hot_standby
wal_receiver_status_interval    10s
wal_segment_size    16MB
wal_sync_method     fdatasync
wal_writer_delay    200ms
work_mem    96MB
shared_buffers  24GB
effective_cache_size    128GB

So that means we're writing 1024 MB worth of WAL files every 2 - 5 seconds, sometimes sustained for 15 - 30 minutes.

1) Do you see any settings we can improve on? Let me know if you need other settings documented.

2) Could we use "SET LOCAL synchronous_commit TO OFF;" at the beginning of these write-intensive transactions to let these WAL writes happen a bit more in the background, having less impact on the rest of the operations?

The data we're rebuilding is stored elsewhere, so on the off chance the power failed AND the RAID battery backup didn't do it's job, we're not out anything once the dataset gets rebuilt again.

Would "SET LOCAL synchronous_commit TO OFF;" cause any problems if this continues for 15 - 30 minutes? Or cause any problems with our streaming replication, which uses WAL senders?

Thanks!

PS. I'm hoping Samsung starts shipping their SM1715 3.2 TB PCIe enterprise SSD, since I think it would solve our problems nicely.

like image 775
user1517922 Avatar asked Jan 15 '15 20:01

user1517922


People also ask

What are PostgreSQL checkpoints?

Description. A checkpoint is a point in the write-ahead log sequence at which all data files have been updated to reflect the information in the log. All data files will be flushed to disk.

What is Max_wal_size?

max_wal_size ( integer ) Maximum size to let the WAL grow during automatic checkpoints. This is a soft limit; WAL size can exceed max_wal_size under special circumstances, such as heavy load, a failing archive_command , or a high wal_keep_size setting. If this value is specified without units, it is taken as megabytes.

What is min_ wal_ size?

min_wal_size puts a minimum on the amount of WAL files recycled for future usage; that much WAL is always recycled for future use, even if the system is idle and the WAL usage estimate suggests that little WAL is needed.

What is Shared_buffers PostgreSQL?

shared_buffers. PostgreSQL uses 'double buffering', meaning that PostgreSQL uses its own internal buffer as well as kernel buffered IO. In short, this means that data is stored in memory twice. The PostgreSQL buffer is named shared_buffers and it defines how much dedicated system memory PostgreSQL will use for cache.


1 Answers

Your server is generating so much WAL data due to the wal_level set to hot_standby. I'm assuming you need this, so the best option to avoid the warnings is to increase your checkpoint_segments. But they are just that - warnings - it's quite common and perfectly normal to see them during bulk updates and data loads. You just happen to be updating frequently.

Changing synchronous_commit does not change what is written to the WAL, but rather the timing of when the commit returns to allow the OS to buffer those writes.

It may not apply to your schema, but you could potentially save some WAL data by using unlogged tables for your data rebuilds. Your replicas wouldn't have access to those tables, but after the rebuild you would be able to update your logged tables from their unlogged siblings.

like image 150
Ben Grimm Avatar answered Sep 21 '22 04:09

Ben Grimm