from multiprocessing import Pool
def op1(data):
return [data[elem] + 1 for elem in range(len(data))]
data = [[elem for elem in range(20)] for elem in range(500000)]
import time
start_time = time.time()
re = []
for data_ in data:
re.append(op1(data_))
print('--- %s seconds ---' % (time.time() - start_time))
start_time = time.time()
pool = Pool(processes=4)
data = pool.map(op1, data)
print('--- %s seconds ---' % (time.time() - start_time))
I get a much slower run time with pool than I get with for loop. But isn't pool supposed to be using 4 processors to do the computation in parallel?
So, multiprocessing is faster when the program is CPU-bound. In cases where there is a lot of I/O in your program, threading may be more efficient because most of the time, your program is waiting for the I/O to complete. However, multiprocessing is generally more efficient because it runs concurrently.
The Python Multiprocessing Pool class allows you to create and manage process pools in Python. Although the Multiprocessing Pool has been available in Python for a long time, it is not widely used, perhaps because of misunderstandings of the capabilities and limitations of Processes and Threads in Python.
multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.
Short answer: Yes, the operations will usually be done on (a subset of) the available cores. But the communication overhead is large. In your example the workload is too small compared to the overhead.
In case you construct a pool, a number of workers will be constructed. If you then instruct to map
given input. The following happens:
Now splitting, communicating and joining data are all processes that are carried out by the main process. These can not be parallelized. Since the operation is fast (O(n) with input size n), the overhead has the same time complexity.
So complexitywise even if you had millions of cores, it would not make much difference, because communicating the list is probably already more expensive than computing the results.
That's why you should parallelize computationally expensive tasks. Not straightforward tasks. The amount of processing should be large compared to the amount of communicating.
In your example, the work is trivial: you add 1 to all the elements. Serializing however is less trivial: you have to encode the lists you send to the worker.
There are a couple of potential trouble spots with your code, but primarily it's too simple.
The multiprocessing
module works by creating different processes, and communicating among them. For each process created, you have to pay the operating system's process startup cost, as well as the python startup cost. Those costs can be high, or low, but they're non-zero in any case.
Once you pay those startup costs, you then pool.map
the worker function across all the processes. Which basically adds 1 to a few numbers. This is not a significant load, as your tests prove.
What's worse, you're using .map()
which is implicitly ordered (compare with .imap_unordered()
), so there's synchronization going on - leaving even less freedom for the various CPU cores to give you speed.
If there's a problem here, it's a "design of experiment" problem - you haven't created a sufficiently difficult problem for multiprocessing
to be able to help you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With