Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python3 parallel code via multiprocessing.pool is slower than sequential code

Here is some code I wrote for parallelizing the creation of a dictionary

parallelize.py

if __name__ == "__main__":
  import time

  from multiprocessing import Pool

  def assign_dict(alist):
    return {x:x for x in alist}


  my_dict = {}
  size = 10000000
  threshold=10000000
  my_list=list(range(size))

  start=time.time()
  my_dict=assign_dict(my_list)
  end=time.time()
  print("check seq",end-start, " sec")

  my_dict = {}

  chunks = [my_list[i*threshold:(i+1)*threshold] for i in range(int(size/threshold))]
  process_count = 7   
  pool = Pool(processes=process_count)

  start = time.time()
  inter_list = pool.map_async(assign_dict, chunks)
  inter_list.wait()
  inter_list=inter_list.get()

  for some_dict in inter_list:
    print("Combining...", time.time()-start, " sec elapsed")
    my_dict = {**my_dict, **some_dict}

  print("check 152002 as key ",my_dict[152002])
  end=time.time()
  print("check parallel",end-start, " sec")

Here is the output for size 1 mil with threshold 1 mil

check seq 0.6559352874755859  sec
Combining... 4.751460790634155  sec elapsed
check 152002 as key  152002
check parallel 5.064720869064331  sec

Here is output for size = 10mil with threshold 1 mil

check seq 0.668889045715332  sec
Combining... 1.6871337890625
Combining... 1.7269806861877441
Combining... 1.860083818435669
Combining... 2.0794677734375
Combining... 2.266465663909912
Combining... 2.47836971282959
Combining... 2.8915648460388184
Combining... 3.2443037033081055
Combining... 3.6063129901885986
Combining... 3.9933629035949707
check 115202 as key  1152002
check parallel 4.406447887420654  sec

Here is the output for size 100 mil with threshold 10 mil, the worst part here is even before the combining, the map_async still takes 55 secs compared to the 19 secs in sequential.

check seq 19.182615041732788  sec
Combining... 55.18172788619995
Combining... 56.38586497306824
Combining... 58.534785747528076
Combining... 61.805513858795166
Combining... 64.75091290473938
Combining... 71.34392070770264
Combining... 76.02847385406494
Combining... 81.2545096874237
Combining... 87.75674867630005
Combining... 109.01232576370239
check 115202 as key  1152002
check parallel 126.1939218044281  sec

Even though I tried various combinations of size and threshold the code with pool is always slower, so it is not that the threshold is too big as the sequential version runs very quickly. Even when the size is the same as the threshold, the code with pool is many seconds slower.

And even for long running jobs like size = 1000 million, the parallelized version is way slower than sequential execution, meaning that there is no benefit of parallelization. I have 8 cores and 16GB RAM, I am running MacOSX, I even verified that my cores were running in parallel in the activity monitor to execute the task, yet it is slower. The combine phase does not take much time as shown. By the time the inter_list.get() command ends, the parallel part is already done. So it cannot interfere with the dictionaries combining.

Can anyone please parallelize this code to be faster than the sequential version in Python 3 or at least help me understand why this is happening?

like image 944
devssh Avatar asked Oct 04 '18 19:10

devssh


People also ask

Why is parallel processing slower than serial?

Parallel can actually be slower then serial in case when there is e.g. only one processor (e.g. single core CPU) and so the operation can't process in parallel. In parallel there will be small amount of time needed to switch the context and this time(s) will not be needed when processed serial.

Does multiprocessing make Python faster?

In multiprocessing , multiple Python processes are created and used to execute a function instead of multiple threads, bypassing the Global Interpreter Lock (GIL) that can significantly slow down threaded Python programs.

Does Python multiprocessing preserve order?

In multiprocessing, there is no guarantee that the processes finish in a certain order. We have processes that calculate the square of a value. The input data is in certain order and we need to maintain this order.

When would you use a multiprocessing pool?

Understand multiprocessing in no more than 6 minutes Multiprocessing is quintessential when a long-running process has to be speeded up or multiple processes have to execute parallelly. Executing a process on a single core confines its capability, which could otherwise spread its tentacles across multiple cores.


1 Answers

Your multiprocessing version is slower than the sequential version because of the inter-process communication required to pass the results from the Pool.map from the workers back to the process from which the workers were forked.

Because of the GIL, Python's multiprocessing library is the suggested way to cpu intensive parallel tasks. However, this means that the virtual memory address spaces of each worker in the Pool are different, and therefore the results of the Pool.map must be serialized and passed between processes. Because the result of your Pool.map is such a large array, this means your program is spending a large amount of time serializing/deserializing the answers and passing them between processes. In the sequential version, there is only a single process and thus the result never needs to be serialized and passed between processes and then deserialized, which is probably why it runs faster in this case.

If you want to avoid this bottleneck, you will want to try using Python's Shared Memory array which can avoid the inter-process communication bottleneck, since the array will be in the same virtual address space of all worker processes. If you really need a key-value pair mapping, then look into Python's multiprocessing.Manager.dict.

In general, Pool.map is good when you can parallelize some computation that does not produce a large quantity of data.

like image 117
Jon Deaton Avatar answered Oct 03 '22 01:10

Jon Deaton