I am trying to figure out explicitly which of the functions in SciPy/NumPy run on multiple processors. I can e.g. read in the SciPy reference manual that SciPy uses this, but I am more interested in exactly which functions do run parallel computations, because not all of them do. The dream scenario would of course be if it is included when you type help(SciPy.foo), but this does not seem to be the case.
Any help will be much appreciated.
Best,
Matias
Key Takeaways. Python is NOT a single-threaded language. Python processes typically use a single thread because of the GIL. Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.
NumPy does not run in parallel. On the other hand Numba fully utilizes the parallel execution capabilities of your computer. NumPy functions are not going to use multiple CPU cores, never mind the GPU.
First, numpy supports multithreading, and this can give you a speed boost in multicore environments! On Linux, I used top to verify that my numpy was indeed using multithreading, which it was. Second, multithreading can hurt performance when you're running multiple Python / numpy processes at once.
NumPy is written in C and so has a faster computational speed. SciPy is written in Python and so has a slower execution speed but vast functionality.
I think the question is better addressed to the BLAS/LAPACK libraries you use rather than to SciPy/NumPy.
Some BLAS/LAPACK libraries, such as MKL, use multiple cores natively where other implementations might not.
To take scipy.linalg.solve
as an example, here's its source code (with some error handling code omitted for clarity):
def solve(a, b, sym_pos=0, lower=0, overwrite_a=0, overwrite_b=0,
debug = 0):
if sym_pos:
posv, = get_lapack_funcs(('posv',),(a1,b1))
c,x,info = posv(a1,b1,
lower = lower,
overwrite_a=overwrite_a,
overwrite_b=overwrite_b)
else:
gesv, = get_lapack_funcs(('gesv',),(a1,b1))
lu,piv,x,info = gesv(a1,b1,
overwrite_a=overwrite_a,
overwrite_b=overwrite_b)
if info==0:
return x
if info>0:
raise LinAlgError, "singular matrix"
raise ValueError,\
'illegal value in %-th argument of internal gesv|posv'%(-info)
As you can see, it's just a thin wrapper around two families of LAPACK functions (exemplified by DPOSV
and DGESV
).
There is no parallelism going on at the SciPy level, yet you observe the function using multiple cores on your system. The only possible explanation is that your LAPACK library is capable of using multiple cores, without NumPy/SciPy doing anything to make this happen.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With