Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql 10 - Parallel configuration

There are 4 configurations to enable the parallel and do the optimization, but the documentation of PostgreSQL doesn't says anything about values or calculation. My questions are:

1- How to calculate the values of max_parallel_workers, max_parallel_workers_per_gather and max_worker_processes?

2- The work_mem can be calculate on base of connections and memory(RAM), but the work_mem needs to change something if I enable the parallel?

My supposition is: if the machine has 8 cores the max_parallel_workers is 8 and the values of worker process and per gather are 32(8*4), the number 4 I took from the original configuration that is 4 gathers per 1 parallel work.

like image 1000
HudsonPH Avatar asked Oct 30 '18 08:10

HudsonPH


3 Answers

There is a nice little online configuration utility that helps to set main postgresql.conf values.

It is not perfect but it gives a starting point that may be helpful.

https://pgtune.leopard.in.ua/#/

like image 191
Jimski Avatar answered Oct 14 '22 01:10

Jimski


After some searching I found some answers, this can help who wants to enable and to have a base configuration, case you have 4 cores(CPU):

your max worker processes will be the amount of cores and the max parallel needs to have the same amount:

max_worker_processes = 4
max_parallel_workers = 4

the gather is more complex because this value can be manipulated base on your needs and resource it is necessary to test to get a best value, but to startup values you can use cores/2.

max_parallel_workers_per_gather = 2

This is not a final answer, there some missing points... I am still searching and updating this answer or waiting for a better one.

like image 43
HudsonPH Avatar answered Oct 14 '22 02:10

HudsonPH


Question is quite obvious, but answer is not.

I'll try to describe is a bit wider, so if something sounds obvious to you - just skip it.

First - how it works is described here here. What that parameters are for is described here. In other words - PG has pool of processes that can do something in background. Maximum number of them is limited by max_worker_processes. When table scan is performed it can take long, long time, so it would be wise to have more processes that are taking data. That can be done in background, by... background workers. Query plan node which can be done by them are: gather, gather-merge.

Each background worker has it's memory - for sorting and other things related to execution. They are there all the time, so it is better to have that in mind, just to be sure system is not using swap...

Besides that. Try to find out best number of workers per query - by default that is 2. So if everything works fine, there are two background workers used to gather data. Next question is how many queries are executed in parallel. I mean - that heavy queries that require parallel processing. Having that two numbers say - 4 workers per query, and 10 queries - there are 40 workers needed, only for that. You can calculate if that is OK, or experiment with that. This way or the other - there is one more parameter - max_worker_processes. Having that 40 workers for parallel processing - you need more workers for other tasks - like replication.

Is that 40 sounds reasonable? There are two counter points here - by default PG is OLTP database. So system is prepared for something else, and that kind of change can bring good results. On the other hand - there is one bgwriter, so after all there is one process that deals with I-O. It rely on system, but still, one process.

So answer is far from perfect - you need to try, collect your own stats, and decide.

like image 37
Michał Zaborowski Avatar answered Oct 14 '22 01:10

Michał Zaborowski