Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to know which SciPy / NumPy functions run on multiple cores?

I am trying to figure out explicitly which of the functions in SciPy/NumPy run on multiple processors. I can e.g. read in the SciPy reference manual that SciPy uses this, but I am more interested in exactly which functions do run parallel computations, because not all of them do. The dream scenario would of course be if it is included when you type help(SciPy.foo), but this does not seem to be the case.

Any help will be much appreciated.

Best,

Matias

like image 397
matiasq Avatar asked Aug 04 '11 12:08

matiasq


People also ask

Does SciPy use multiple cores?

Key Takeaways. Python is NOT a single-threaded language. Python processes typically use a single thread because of the GIL. Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.

Does NumPy automatically use multiple cores?

NumPy does not run in parallel. On the other hand Numba fully utilizes the parallel execution capabilities of your computer. NumPy functions are not going to use multiple CPU cores, never mind the GPU.

Are NumPy operations multithreaded?

First, numpy supports multithreading, and this can give you a speed boost in multicore environments! On Linux, I used top to verify that my numpy was indeed using multithreading, which it was. Second, multithreading can hurt performance when you're running multiple Python / numpy processes at once.

Is NumPy faster than SciPy?

NumPy is written in C and so has a faster computational speed. SciPy is written in Python and so has a slower execution speed but vast functionality.


1 Answers

I think the question is better addressed to the BLAS/LAPACK libraries you use rather than to SciPy/NumPy.

Some BLAS/LAPACK libraries, such as MKL, use multiple cores natively where other implementations might not.

To take scipy.linalg.solve as an example, here's its source code (with some error handling code omitted for clarity):

def solve(a, b, sym_pos=0, lower=0, overwrite_a=0, overwrite_b=0,
          debug = 0):
    if sym_pos:
        posv, = get_lapack_funcs(('posv',),(a1,b1))
        c,x,info = posv(a1,b1,
                        lower = lower,
                        overwrite_a=overwrite_a,
                        overwrite_b=overwrite_b)
    else:
        gesv, = get_lapack_funcs(('gesv',),(a1,b1))
        lu,piv,x,info = gesv(a1,b1,
                             overwrite_a=overwrite_a,
                             overwrite_b=overwrite_b)

    if info==0:
        return x
    if info>0:
        raise LinAlgError, "singular matrix"
    raise ValueError,\
          'illegal value in %-th argument of internal gesv|posv'%(-info)

As you can see, it's just a thin wrapper around two families of LAPACK functions (exemplified by DPOSV and DGESV).

There is no parallelism going on at the SciPy level, yet you observe the function using multiple cores on your system. The only possible explanation is that your LAPACK library is capable of using multiple cores, without NumPy/SciPy doing anything to make this happen.

like image 67
NPE Avatar answered Oct 06 '22 11:10

NPE