Python3 parallel code via multiprocessing.pool is slower than sequential code

Tags:

python-3.x

Here is some code I wrote for parallelizing the creation of a dictionary

parallelize.py

if __name__ == "__main__":
  import time

  from multiprocessing import Pool

  def assign_dict(alist):
    return {x:x for x in alist}


  my_dict = {}
  size = 10000000
  threshold=10000000
  my_list=list(range(size))

  start=time.time()
  my_dict=assign_dict(my_list)
  end=time.time()
  print("check seq",end-start, " sec")

  my_dict = {}

  chunks = [my_list[i*threshold:(i+1)*threshold] for i in range(int(size/threshold))]
  process_count = 7   
  pool = Pool(processes=process_count)

  start = time.time()
  inter_list = pool.map_async(assign_dict, chunks)
  inter_list.wait()
  inter_list=inter_list.get()

  for some_dict in inter_list:
    print("Combining...", time.time()-start, " sec elapsed")
    my_dict = {**my_dict, **some_dict}

  print("check 152002 as key ",my_dict[152002])
  end=time.time()
  print("check parallel",end-start, " sec")

Here is the output for size 1 mil with threshold 1 mil

check seq 0.6559352874755859  sec
Combining... 4.751460790634155  sec elapsed
check 152002 as key  152002
check parallel 5.064720869064331  sec

Here is output for size = 10mil with threshold 1 mil

check seq 0.668889045715332  sec
Combining... 1.6871337890625
Combining... 1.7269806861877441
Combining... 1.860083818435669
Combining... 2.0794677734375
Combining... 2.266465663909912
Combining... 2.47836971282959
Combining... 2.8915648460388184
Combining... 3.2443037033081055
Combining... 3.6063129901885986
Combining... 3.9933629035949707
check 115202 as key  1152002
check parallel 4.406447887420654  sec

Here is the output for size 100 mil with threshold 10 mil, the worst part here is even before the combining, the map_async still takes 55 secs compared to the 19 secs in sequential.

check seq 19.182615041732788  sec
Combining... 55.18172788619995
Combining... 56.38586497306824
Combining... 58.534785747528076
Combining... 61.805513858795166
Combining... 64.75091290473938
Combining... 71.34392070770264
Combining... 76.02847385406494
Combining... 81.2545096874237
Combining... 87.75674867630005
Combining... 109.01232576370239
check 115202 as key  1152002
check parallel 126.1939218044281  sec

Even though I tried various combinations of size and threshold the code with pool is always slower, so it is not that the threshold is too big as the sequential version runs very quickly. Even when the size is the same as the threshold, the code with pool is many seconds slower.

And even for long running jobs like size = 1000 million, the parallelized version is way slower than sequential execution, meaning that there is no benefit of parallelization. I have 8 cores and 16GB RAM, I am running MacOSX, I even verified that my cores were running in parallel in the activity monitor to execute the task, yet it is slower. The combine phase does not take much time as shown. By the time the inter_list.get() command ends, the parallel part is already done. So it cannot interfere with the dictionaries combining.

Can anyone please parallelize this code to be faster than the sequential version in Python 3 or at least help me understand why this is happening?

944

asked Oct 04 '18 19:10

devssh

1 Answers

Your multiprocessing version is slower than the sequential version because of the inter-process communication required to pass the results from the Pool.map from the workers back to the process from which the workers were forked.

Because of the GIL, Python's multiprocessing library is the suggested way to cpu intensive parallel tasks. However, this means that the virtual memory address spaces of each worker in the Pool are different, and therefore the results of the Pool.map must be serialized and passed between processes. Because the result of your Pool.map is such a large array, this means your program is spending a large amount of time serializing/deserializing the answers and passing them between processes. In the sequential version, there is only a single process and thus the result never needs to be serialized and passed between processes and then deserialized, which is probably why it runs faster in this case.

If you want to avoid this bottleneck, you will want to try using Python's Shared Memory array which can avoid the inter-process communication bottleneck, since the array will be in the same virtual address space of all worker processes. If you really need a key-value pair mapping, then look into Python's multiprocessing.Manager.dict.

In general, Pool.map is good when you can parallelize some computation that does not produce a large quantity of data.

117

answered Oct 03 '22 01:10

Jon Deaton

Related questions
                            
                                Access axes object in seaborn lmplot [duplicate]
                            
                                Fuzzy Match columns of Different Dataframe
                            
                                Exact inverse of pandas' "pivot" operation
                            
                                Any way to change color bar (cbar) in seaborn to a legend (for a binary heatmap)?
                            
                                Targeting a specific metric to optimize in tensorflow
                            
                                How to assign custom color to masked cells in seaborn heatmap?
                            
                                Pandas: Groupby and iterate with conditionals within groups?
                            
                                "PACKAGES DO NOT MATCH THE HASHES" error with pip
                            
                                Sigmoid function returns 1 for large positive inputs
                            
                                ModuleNotFoundError: No module named 'pip.download' when trying to install Python package for Django
                            
                                How to set requests 'user-agent' header globally
                            
                                Python xarray remove coordinates with all missing variables
                            
                                Storing the results of Web Scraping into Database
                            
                                How do I give a delay in user input to a Textbox in a dash app?
                            
                                Fastest way to generate a dict from list where key == value
                            
                                Set custom seaborn color palette using hex codes, and name the colors
                            
                                predict_proba() method of Keras model does not exist
                            
                                Run custom task when call `pip install`
                            
                                How to implement Backus-Naur Form in Python
                            
                                Zip single file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With