Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallelism in Python

What are the options for achieving parallelism in Python? I want to perform a bunch of CPU bound calculations over some very large rasters, and would like to parallelise them. Coming from a C background, I am familiar with three approaches to parallelism:

  1. Message passing processes, possibly distributed across a cluster, e.g. MPI.
  2. Explicit shared memory parallelism, either using pthreads or fork(), pipe(), et. al
  3. Implicit shared memory parallelism, using OpenMP.

Deciding on an approach to use is an exercise in trade-offs.

In Python, what approaches are available and what are their characteristics? Is there a clusterable MPI clone? What are the preferred ways of achieving shared memory parallelism? I have heard reference to problems with the GIL, as well as references to tasklets.

In short, what do I need to know about the different parallelization strategies in Python before choosing between them?

like image 472
fmark Avatar asked Jun 07 '10 08:06

fmark


People also ask

Can Python run in parallel?

Multiprocessing in Python enables the computer to utilize multiple cores of a CPU to run tasks/processes in parallel. Multiprocessing enables the computer to utilize multiple cores of a CPU to run tasks/processes in parallel.

Is Python threading parallel or concurrent?

Threading is just one of the many ways concurrent programs can be built. In this article, we will take a look at threading and a couple of other strategies in building concurrent programs in Python, as well as discuss how each is suitable in different scenarios.

How does concurrency work in Python?

What Is Concurrency? The dictionary definition of concurrency is simultaneous occurrence. In Python, the things that are occurring simultaneously are called by different names (thread, task, process) but at a high level, they all refer to a sequence of instructions that run in order.

How do you get concurrency in Python?

Many times the concurrent processes need to access the same data at the same time. Another solution, than using of explicit locks, is to use a data structure that supports concurrent access. For example, we can use the queue module, which provides thread-safe queues. We can also use multiprocessing.


1 Answers

Generally, you describe a CPU bound calculation. This is not Python's forte. Neither, historically, is multiprocessing.

Threading in the mainstream Python interpreter has been ruled by a dreaded global lock. The new multiprocessing API works around that and gives a worker pool abstraction with pipes and queues and such.

You can write your performance critical code in C or Cython, and use Python for the glue.

like image 69
Will Avatar answered Sep 19 '22 09:09

Will