Let's use, for example, <code>numpy.sin()</code> The following code will return the value of the sine for each value of the array <code>a</code>: <pre class="prettyprint"><code>import numpy a = numpy.arange( 1000000 ) result = numpy.sin( a ) </code></pre> But my machine has 32 cores, so I'd like to make use of them. (The overhead might not be worthwhile for something like <code>numpy.sin()</code> but the function I actually want to use is quite a bit more complicated, and I will be working with a huge amount of data.) Is this the best (read: smartest or fastest) method: <pre class="prettyprint"><code>from multiprocessing import Pool if __name__ == '__main__': pool = Pool() result = pool.map( numpy.sin, a ) </code></pre> or is there a better way to do this?

There is a better way: numexpr Slightly reworded from their main page: It's a multi-threaded VM written in C that analyzes expressions, rewrites them more efficiently, and compiles them on the fly into code that gets near optimal parallel performance for both memory and cpu bounded operations. For example, in my 4 core machine, evaluating a sine is just slightly less than 4 times faster than numpy. <pre class="prettyprint"><code>In [1]: import numpy as np In [2]: import numexpr as ne In [3]: a = np.arange(1000000) In [4]: timeit ne.evaluate('sin(a)') 100 loops, best of 3: 15.6 ms per loop In [5]: timeit np.sin(a) 10 loops, best of 3: 54 ms per loop </code></pre> Documentation, including supported functions here. You'll have to check or give us more information to see if your more complicated function can be evaluated by numexpr.

Well this is kind of interesting note if you run the following commands: <pre class="prettyprint"><code>import numpy from multiprocessing import Pool a = numpy.arange(1000000) pool = Pool(processes = 5) result = pool.map(numpy.sin, a) UnpicklingError: NEWOBJ class argument has NULL tp_new </code></pre> wasn't expecting that, so whats going on, well: <pre class="prettyprint"><code>>>> help(numpy.sin) Help on ufunc object: sin = class ufunc(__builtin__.object) | Functions that operate element by element on whole arrays. | | To see the documentation for a specific ufunc, use np.info(). For | example, np.info(np.sin). Because ufuncs are written in C | (for speed) and linked into Python with NumPy's ufunc facility, | Python's help() function finds this page whenever help() is called | on a ufunc. </code></pre> yep numpy.sin is implemented in c as such you can't really use it directly with multiprocessing. so we have to wrap it with another function perf: <pre class="prettyprint"><code>import time import numpy from multiprocessing import Pool def numpy_sin(value): return numpy.sin(value) a = numpy.arange(1000000) pool = Pool(processes = 5) start = time.time() result = numpy.sin(a) end = time.time() print 'Singled threaded %f' % (end - start) start = time.time() result = pool.map(numpy_sin, a) pool.close() pool.join() end = time.time() print 'Multithreaded %f' % (end - start) $ python perf.py Singled threaded 0.032201 Multithreaded 10.550432 </code></pre> wow, wasn't expecting that either, well theres a couple of issues for starters we are using a python function even if its just a wrapper vs a pure c function, and theres also the overhead of copying the values, multiprocessing by default doesn't share data, as such each value needs to be copy back/forth. do note that if properly segment our data: <pre class="prettyprint"><code>import time import numpy from multiprocessing import Pool def numpy_sin(value): return numpy.sin(value) a = [numpy.arange(100000) for _ in xrange(10)] pool = Pool(processes = 5) start = time.time() result = numpy.sin(a) end = time.time() print 'Singled threaded %f' % (end - start) start = time.time() result = pool.map(numpy_sin, a) pool.close() pool.join() end = time.time() print 'Multithreaded %f' % (end - start) $ python perf.py Singled threaded 0.150192 Multithreaded 0.055083 </code></pre> So what can we take from this, multiprocessing is great but we should always test and compare it sometimes its faster and sometimes its slower, depending how its used ... Granted you are not using <code>numpy.sin</code> but another function I would recommend you first verify that indeed multiprocessing will speed up the computation, maybe the overhead of copying values back/forth may affect you. Either way I also do believe that using <code>pool.map</code> is the best, safest method of multithreading code ... I hope this helps.

Parallelizing a Numpy vector operation

Let's use, for example, numpy.sin()

The following code will return the value of the sine for each value of the array a:

import numpy a = numpy.arange( 1000000 ) result = numpy.sin( a )

But my machine has 32 cores, so I'd like to make use of them. (The overhead might not be worthwhile for something like numpy.sin() but the function I actually want to use is quite a bit more complicated, and I will be working with a huge amount of data.)

Is this the best (read: smartest or fastest) method:

from multiprocessing import Pool if __name__ == '__main__':     pool = Pool()     result = pool.map( numpy.sin, a )

or is there a better way to do this?

What is a vectorized operation in NumPy?

Define a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns a single numpy array or a tuple of numpy arrays. The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy.

Are NumPy operations parallelized?

NumPy does not run in parallel. On the other hand Numba fully utilizes the parallel execution capabilities of your computer. NumPy functions are not going to use multiple CPU cores, never mind the GPU.

What does [: :] mean on NumPy arrays?

The [:, :] stands for everything from the beginning to the end just like for lists. The difference is that the first : stands for first and the second : for the second dimension. a = numpy. zeros((3, 3)) In [132]: a Out[132]: array([[ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.]])

There is a better way: numexpr

Slightly reworded from their main page:

It's a multi-threaded VM written in C that analyzes expressions, rewrites them more efficiently, and compiles them on the fly into code that gets near optimal parallel performance for both memory and cpu bounded operations.

For example, in my 4 core machine, evaluating a sine is just slightly less than 4 times faster than numpy.

In [1]: import numpy as np In [2]: import numexpr as ne In [3]: a = np.arange(1000000) In [4]: timeit ne.evaluate('sin(a)') 100 loops, best of 3: 15.6 ms per loop     In [5]: timeit np.sin(a) 10 loops, best of 3: 54 ms per loop

Documentation, including supported functions here. You'll have to check or give us more information to see if your more complicated function can be evaluated by numexpr.

Well this is kind of interesting note if you run the following commands:

import numpy from multiprocessing import Pool a = numpy.arange(1000000)     pool = Pool(processes = 5) result = pool.map(numpy.sin, a)  UnpicklingError: NEWOBJ class argument has NULL tp_new

wasn't expecting that, so whats going on, well:

>>> help(numpy.sin)    Help on ufunc object:  sin = class ufunc(__builtin__.object)  |  Functions that operate element by element on whole arrays.  |    |  To see the documentation for a specific ufunc, use np.info().  For  |  example, np.info(np.sin).  Because ufuncs are written in C  |  (for speed) and linked into Python with NumPy's ufunc facility,  |  Python's help() function finds this page whenever help() is called  |  on a ufunc.

yep numpy.sin is implemented in c as such you can't really use it directly with multiprocessing.

so we have to wrap it with another function

perf:

import time import numpy from multiprocessing import Pool  def numpy_sin(value):     return numpy.sin(value)  a = numpy.arange(1000000) pool = Pool(processes = 5)  start = time.time() result = numpy.sin(a) end = time.time() print 'Singled threaded %f' % (end - start) start = time.time() result = pool.map(numpy_sin, a) pool.close() pool.join() end = time.time() print 'Multithreaded %f' % (end - start)   $ python perf.py  Singled threaded 0.032201 Multithreaded 10.550432

wow, wasn't expecting that either, well theres a couple of issues for starters we are using a python function even if its just a wrapper vs a pure c function, and theres also the overhead of copying the values, multiprocessing by default doesn't share data, as such each value needs to be copy back/forth.

do note that if properly segment our data:

import time import numpy from multiprocessing import Pool  def numpy_sin(value):     return numpy.sin(value)  a = [numpy.arange(100000) for _ in xrange(10)] pool = Pool(processes = 5)  start = time.time() result = numpy.sin(a) end = time.time() print 'Singled threaded %f' % (end - start) start = time.time() result = pool.map(numpy_sin, a) pool.close() pool.join() end = time.time() print 'Multithreaded %f' % (end - start)  $ python perf.py  Singled threaded 0.150192 Multithreaded 0.055083

So what can we take from this, multiprocessing is great but we should always test and compare it sometimes its faster and sometimes its slower, depending how its used ...

Granted you are not using numpy.sin but another function I would recommend you first verify that indeed multiprocessing will speed up the computation, maybe the overhead of copying values back/forth may affect you.

Either way I also do believe that using pool.map is the best, safest method of multithreading code ...

I hope this helps.

Parallelizing a Numpy vector operation

Tags:

python

multiprocessing

numpy

numexpr

user1475412

People also ask

2 Answers

jorgeca

Samy Vilar

Recent Activity

Donate For Us

Parallelizing a Numpy vector operation

Tags:

python

multiprocessing

numpy

numexpr

user1475412

People also ask

2 Answers

jorgeca

Samy Vilar

Related questions

Recent Activity

Donate For Us