Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres gets out of memory errors despite having plenty of free memory

I have a server running Postgres 9.1.15. The server has 2GB of RAM and no swap. Intermittently Postgres will start getting "out of memory" errors on some SELECTs, and will continue doing so until I restart Postgres or some of the clients that are connected to it. What's weird is that when this happens, free still reports over 500MB of free memory.

select version();:

PostgreSQL 9.1.15 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

uname -a:

Linux db 3.2.0-23-virtual #36-Ubuntu SMP Tue Apr 10 22:29:03 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Postgresql.conf (everything else is commented out/default):

max_connections = 100
shared_buffers = 500MB
work_mem = 2000kB
maintenance_work_mem = 128MB
wal_buffers = 16MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
random_page_cost = 2.0
effective_cache_size = 1000MB
default_statistics_target = 100
log_temp_files = 0

I got these values from pgtune (I chose "mixed type of applications") and have been fiddling with them based on what I've read, without making much real progress. At the moment there's 68 connections, which is a typical number (I'm not using pgbouncer or any other connection poolers yet).

/etc/sysctl.conf:

kernel.shmmax=1050451968
kernel.shmall=256458

vm.overcommit_ratio=100
vm.overcommit_memory=2

I first changed overcommit_memory to 2 about a fortnight ago after the OOM killer killed the Postgres server. Prior to that the server had been running fine for a long time. The errors I get now are less catastrophic but much more annoying because they are much more frequent.

I haven't had much luck pinpointing the first event that causes postgres to run "out of memory" - it seems to be different each time. The most recent time it crashed, the first three lines logged were:

2015-04-07 05:32:39 UTC ERROR:  out of memory
2015-04-07 05:32:39 UTC DETAIL:  Failed on request of size 125.
2015-04-07 05:32:39 UTC CONTEXT:  automatic analyze of table "xxx.public.delayed_jobs"
TopMemoryContext: 68688 total in 10 blocks; 4560 free (4 chunks); 64128 used
[... snipped heaps of lines which I can provide if they are useful ...]

---

2015-04-07 05:33:58 UTC ERROR:  out of memory
2015-04-07 05:33:58 UTC DETAIL:  Failed on request of size 16.
2015-04-07 05:33:58 UTC STATEMENT:  SELECT oid, typname, typelem, typdelim, typinput FROM pg_type
2015-04-07 05:33:59 UTC LOG:  could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG:  could not fork new process for connection: Cannot allocate memory
2015-04-07 05:33:59 UTC LOG:  could not fork new process for connection: Cannot allocate memory
TopMemoryContext: 396368 total in 50 blocks; 10160 free (28 chunks); 386208 used
[... snipped heaps of lines which I can provide if they are useful ...]

---

2015-04-07 05:33:59 UTC ERROR:  out of memory
2015-04-07 05:33:59 UTC DETAIL:  Failed on request of size 1840.
2015-04-07 05:33:59 UTC STATEMENT:  SELECT... [nested select with 4 joins, 19 ands, and 2 order bys]
TopMemoryContext: 388176 total in 49 blocks; 17264 free (55 chunks); 370912 used

The crash before that, a few hours earlier, just had three instances of that last query as the first three lines of the crash. That query gets run very often, so I'm not sure if the issues are because of this query, or if it just comes up in the error log because it's a reasonably complex SELECT getting run all the time. That said, here's an EXPLAIN ANALYZE of it: http://explain.depesz.com/s/r00

This is what ulimit -a for the postgres user looks like:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 15956
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 15956
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I'll try and get the exact numbers from free next time there's a crash, in the meantime this is a braindump of all the info I have.

Any ideas on where to go from here?

like image 712
Alex Ghiculescu Avatar asked Apr 07 '15 06:04

Alex Ghiculescu


3 Answers

I just ran into this same issue with a ~2.5 GB plain-text SQL file I was trying to restore. I scaled my Digital Ocean server up to 64 GB RAM, created a 10 GB swap file, and tried again. I got an out-of-memory error with 50 GB free, and no swap in use.

I scaled back my server to the small 1 GB instance I was using (requiring a reboot) and figured I'd give it another shot for no other reason than I was frustrated. I started the import and realized I forgot to create my temporary swap file again.

I created it in the middle of the import. psql made it a lot further before crashing. It made it through 5 additional tables.

I think there must be a bug allocating memory in psql.

like image 97
Aaron C. de Bruyn Avatar answered Oct 24 '22 18:10

Aaron C. de Bruyn


Can you check if there's any swap memory available when the error raises up?

I've remove completely the swap memory in my Linux desktop (just for testing other things...) and I got the exactly same error! I'm pretty sure that this is what is going on with you too.

like image 2
Christian Avatar answered Oct 24 '22 20:10

Christian


It is a bit suspicious that you report the same free memory size as your shared_buffers size. Are you sure you are looking the right values?

Output of free command at the time of crash would be useful as well as the content of /proc/meminfo

Beware that setting overcommit_memory to 2 is not so effective if you see the overcommit_ratio to 100. It will basically limits the memory allocation to the size swap (0 in this case) + 100% of physical RAM, which doesn't take into account any space for shared memory and disk caches.

You should probably set overcommit_ratio to 50.

like image 1
mnencia Avatar answered Oct 24 '22 20:10

mnencia