Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Multicore processing?

I've been reading about Python's multiprocessing module. I still don't think I have a very good understanding of what it can do.

Let's say I have a quadcore processor and I have a list with 1,000,000 integers and I want the sum of all the integers. I could simply do:

list_sum = sum(my_list) 

But this only sends it to one core.

Is it possible, using the multiprocessing module, to divide the array up and have each core get the sum of it's part and return the value so the total sum may be computed?

Something like:

core1_sum = sum(my_list[0:500000])          #goes to core 1 core2_sum = sum(my_list[500001:1000000])    #goes to core 2 all_core_sum = core1_sum + core2_sum        #core 3 does final computation 

Any help would be appreciated.

like image 223
Nope Avatar asked Jul 25 '09 15:07

Nope


People also ask

Can Python run on multiple cores?

Key Takeaways. Python is NOT a single-threaded language. Python processes typically use a single thread because of the GIL. Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.

How do you do multiple processing in Python?

In this example, at first we import the Process class then initiate Process object with the display() function. Then process is started with start() method and then complete the process with the join() method. We can also pass arguments to the function using args keyword.

Is Python good for multithreading?

Python doesn't support multi-threading because Python on the Cpython interpreter does not support true multi-core execution via multithreading. However, Python does have a threading library. The GIL does not prevent threading.

Is Python good for multiprocessing?

If your code is IO bound, both multiprocessing and multithreading in Python will work for you. Multiprocessing is a easier to just drop in than threading but has a higher memory overhead.


2 Answers

Yes, it's possible to do this summation over several processes, very much like doing it with multiple threads:

from multiprocessing import Process, Queue  def do_sum(q,l):     q.put(sum(l))  def main():     my_list = range(1000000)      q = Queue()      p1 = Process(target=do_sum, args=(q,my_list[:500000]))     p2 = Process(target=do_sum, args=(q,my_list[500000:]))     p1.start()     p2.start()     r1 = q.get()     r2 = q.get()     print r1+r2  if __name__=='__main__':     main() 

However, it is likely that doing it with multiple processes is likely slower than doing it in a single process, as copying the data forth and back is more expensive than summing them right away.

like image 182
Martin v. Löwis Avatar answered Sep 22 '22 05:09

Martin v. Löwis


Welcome the world of concurrent programming.

What Python can (and can't) do depends on two things.

  1. What the OS can (and can't) do. Most OS's allocate processes to cores. To use 4 cores, you need to break your problem into four processes. This is easier than it sounds. Sometimes.

  2. What the underlying C libraries can (and can't) do. If the C libraries expose features of the OS AND the OS exposes features of the hardware, you're solid.

To break a problem into multiple processes -- especially in GNU/Linux -- is easy. Break it into a multi-step pipeline.

In the case of summing a million numbers, think of the following shell script. Assuming some hypothetical sum.py program that sums either a range of numbers or a list of numbers on stdin.

( sum.py 0 500000 & sum.py 50000 1000000 ) | sum.py

This would have 3 concurrent processes. Two are doing sums of a lot of numbers, the third is summing two numbers.

Since the GNU/Linux shells and the OS already handle some parts of concurrency for you, you can design simple (very, very simple) programs that read from stdin, write to stdout, and are designed to do small parts of a large job.

You can try to reduce the overheads by using subprocess to build the pipeline instead of allocating the job to the shell. You may find, however, that the shell builds pipelines very, very quickly. (It was written directly in C and makes direct OS API calls for you.)

like image 45
S.Lott Avatar answered Sep 23 '22 05:09

S.Lott