I'm running postgresql 9.3 on a machine with 32GB ram, with 0 swap. There are up to 200 clients connected. There's 1 other 4GB process running on the box. How do I interpret this error log message? How can I prevent the out of memory error? Allow swapping? Add more memory to the machine? Allow fewer client connections? Adjust a setting?
example pg_top:
last pid: 6607; load avg: 3.59, 2.32, 2.61; up 16+09:17:29 20:49:51
113 processes: 1 running, 111 sleeping, 1 uninterruptable
CPU states: 22.5% user, 0.0% nice, 4.9% system, 63.2% idle, 9.4% iowait
Memory: 29G used, 186M free, 7648K buffers, 23G cached
DB activity: 2479 tps, 1 rollbs/s, 217 buffer r/s, 99 hit%, 11994 row r/s, 3820 row w/s
DB I/O: 0 reads/s, 0 KB/s, 0 writes/s, 0 KB/s
DB disk: 149.8 GB total, 46.7 GB free (68% used)
Swap:
example top showing the only other significant 4GB process on the box:
top - 21:05:09 up 16 days, 9:32, 2 users, load average: 2.73, 2.91, 2.88
Tasks: 247 total, 3 running, 244 sleeping, 0 stopped, 0 zombie
%Cpu(s): 22.1 us, 4.1 sy, 0.0 ni, 62.9 id, 9.8 wa, 0.0 hi, 0.7 si, 0.3 st
KiB Mem: 30827220 total, 30642584 used, 184636 free, 7292 buffers
KiB Swap: 0 total, 0 used, 0 free. 23449636 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7407 postgres 20 0 7604928 10172 7932 S 29.6 0.0 2:51.27 postgres
10469 postgres 20 0 7617716 176032 160328 R 11.6 0.6 0:01.48 postgres
10211 postgres 20 0 7630352 237736 208704 S 10.6 0.8 0:03.64 postgres
18202 elastic+ 20 0 8726984 4.223g 4248 S 9.6 14.4 883:06.79 java
9711 postgres 20 0 7619500 354188 335856 S 7.0 1.1 0:08.03 postgres
3638 postgres 20 0 7634552 1.162g 1.127g S 6.6 4.0 0:50.42 postgres
postgresql.conf:
max_connections = 1000 # (change requires restart)
shared_buffers = 7GB # min 128kB
work_mem = 40MB # min 64kB
maintenance_work_mem = 1GB # min 1MB
effective_cache_size = 20GB
....
log:
ERROR: out of memory
DETAIL: Failed on request of size 67108864.
STATEMENT: SELECT "package_texts".* FROM "package_texts" WHERE "package_texts"."id" = $1 LIMIT 1
TopMemoryContext: 798624 total in 83 blocks; 11944 free (21 chunks); 786680 used
TopTransactionContext: 8192 total in 1 blocks; 7328 free (0 chunks); 864 used
Prepared Queries: 253952 total in 5 blocks; 136272 free (18 chunks); 117680 used
Type information cache: 24240 total in 2 blocks; 3744 free (0 chunks); 20496 used
Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used
MessageContext: 8192 total in 1 blocks; 6976 free (0 chunks); 1216 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
smgr relation table: 24576 total in 2 blocks; 5696 free (4 chunks); 18880 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
PortalHeapMemory: 1024 total in 1 blocks; 568 free (0 chunks); 456 used
ExecutorState: 32928 total in 3 blocks; 15616 free (5 chunks); 17312 used
printtup: 34002024 total in 2 blocks; 7056 free (7 chunks); 33994968 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 24576 total in 2 blocks; 12832 free (3 chunks); 11744 used
CacheMemoryContext: 1372624 total in 24 blocks; 38832 free (0 chunks); 1333792 used
CachedPlanSource: 7168 total in 3 blocks; 3080 free (1 chunks); 4088 used
CachedPlanQuery: 7168 total in 3 blocks; 2992 free (1 chunks); 4176 used
CachedPlanSource: 15360 total in 4 blocks; 7128 free (5 chunks); 8232 used
CachedPlanQuery: 15360 total in 4 blocks; 3320 free (1 chunks); 12040 used
CachedPlanSource: 3072 total in 2 blocks; 552 free (0 chunks); 2520 used
CachedPlanQuery: 7168 total in 3 blocks; 1592 free (1 chunks); 5576 used
CachedPlanSource: 3072 total in 2 blocks; 536 free (0 chunks); 2536 used
... 2 Thousand snipped lines of CachedPlans ...
CachedPlanSource: 15360 total in 4 blocks; 7128 free (5 chunks); 8232 used
CachedPlanQuery: 15360 total in 4 blocks; 3320 free (1 chunks); 12040 used
CachedPlanSource: 7168 total in 3 blocks; 3880 free (3 chunks); 3288 used
CachedPlanQuery: 7168 total in 3 blocks; 4032 free (1 chunks); 3136 used
CachedPlanSource: 7168 total in 3 blocks; 3936 free (3 chunks); 3232 used
CachedPlanQuery: 7168 total in 3 blocks; 4032 free (1 chunks); 3136 used
CachedPlanSource: 7168 total in 3 blocks; 3080 free (1 chunks); 4088 used
CachedPlanQuery: 7168 total in 3 blocks; 2992 free (1 chunks); 4176 used
CachedPlanSource: 7168 total in 3 blocks; 3872 free (2 chunks); 3296 used
CachedPlanQuery: 7168 total in 3 blocks; 4032 free (1 chunks); 3136 used
pg_toast_17305_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
index_package_raises_on_natural_key: 3072 total in 2 blocks; 1648 free (1 chunks); 1424 used
index_package_extensions_on_natural_key: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
index_package_mixins_on_natural_key: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
index_package_mixins_on_includes_id: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
package_texts_pkey: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
index_package_file_objects_on_natural_key: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
index_package_symbols_on_natural_key: 3072 total in 2 blocks; 1136 free (1 chunks); 1936 used
index_package_symbols_on_full_name: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
index_package_symbols_on_alias_for_id: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
package_symbols_pkey: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_toast_17313_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
index_packages_on_natural_key: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
packages_pkey: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
index_package_files_on_natural_key: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
package_files_pkey: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_toast_2619_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
index_projects_on_user_id: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
index_projects_on_type: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
index_projects_on_name_and_type: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
index_projects_on_claim_ticket: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
ruby_gem_metadata_pkey: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_constraint_contypid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_constraint_conrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_constraint_conname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_attrdef_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_index_indrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_db_role_setting_databaseid_rol_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_enum_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_foreign_server_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_statistic_relid_att_inh_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_cast_source_target_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_language_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_collation_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_amop_fam_strat_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_ts_template_tmplname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_config_map_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_opclass_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_event_trigger_evtname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_ts_dict_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_event_trigger_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_conversion_default_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_enum_typid_label_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_config_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_user_mapping_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_foreign_table_relid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_type_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_constraint_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_parser_prsname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_config_cfgname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_ts_parser_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_operator_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_namespace_nspname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_ts_template_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_amop_opr_fam_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_default_acl_role_nsp_obj_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_collation_name_enc_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used
pg_range_rngtypid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_ts_dict_dictname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_opfamily_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_class_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_proc_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_language_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_namespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used
pg_foreign_server_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
pg_conversion_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used
pg_user_mapping_user_server_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_authid_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_tablespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_database_datname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used
pg_database_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_authid_rolname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
MdSmgr: 24576 total in 2 blocks; 13984 free (0 chunks); 10592 used
ident parser context: 0 total in 0 blocks; 0 free (0 chunks); 0 used
hba parser context: 7168 total in 3 blocks; 304 free (1 chunks); 6864 used
LOCALLOCK hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
ErrorContext: 8192 total in 1 blocks; 8160 free (6 chunks); 32 used
If I'm reading the output of your top correctly, it's not taken at a point when you're out of memory.
The actual error seems fine - it's not requesting a huge amount of memory so presumably the machine was out of memory at that point.
Let's take a quick look at your settings:
max_connections = 1000 # (change requires restart)
work_mem = 40MB # min 64kB
So - you are of the opinion that you can support 1000 concurrent queries each using say 10 + 40MB (some might use multiples of 40MB but let's be reasonable). So - this is suggesting to me that your machine has > 500 cores and say 100GB of RAM. That's not the case.
So - take your number of cores and double it - that's a reasonable value for the max number of connections. That will allow you one query on each core while another is waiting for I/O. Then, place a connection pooler in front of the DB if you need to (pgbouncer / Java's connection pooling).
Then, you might even consider increasing work_mem if you need to.
Oh - perfectly reasonable to run without swap enabled. Once you start swapping you are in a world of pain anyway as regards database usage.
Edit: expand on work_mem vs shared
If in doubt, always refer to the documentation.
The shared_buffers
value is, as the name suggests shared between backends. The work_mem
is not only per backend, it's actually per sort. So - one query might use three or four times that amount if it is doing sorts on three subqueries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With