I made two installations:
brew install numpy
(and scipy) --with-openblas
After I cloned two handy scripts for verification of these libraries in multi-threaded environment:
git clone https://gist.github.com/3842524.git
Then for each installation I'm executing show_config
:
python -c "import scipy as np; np.show_config()"
It's all nice for the installation 1:
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/opt/openblas/lib']
language = f77
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/opt/openblas/lib']
language = f77
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/opt/openblas/lib']
language = f77
blas_mkl_info:
NOT AVAILABLE
But installation 2 the things are not so bright:
lapack_opt_info:
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
extra_compile_args = ['-msse3']
define_macros = [('NO_ATLAS_INFO', 3)]
blas_opt_info:
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
extra_compile_args = ['-msse3', '- I/System/Library/Frameworks/vecLib.framework/Headers']
define_macros = [('NO_ATLAS_INFO', 3)]
So it seems when I failed to link OpenBLAS correctly. But it's fine for now, here the performance results. All tests are performed on iMac, Yosemite, i7-4790K, 4 cores, hyper-threaded.
First installation with OpenBLAS:
numpy:
OMP_NUM_THREADS=1 python test_numpy.py
FAST BLAS
version: 1.9.2
maxint: 9223372036854775807
dot: 0.126578998566 sec
OMP_NUM_THREADS=2 python test_numpy.py
FAST BLAS
version: 1.9.2
maxint: 9223372036854775807
dot: 0.0640147686005 sec
OMP_NUM_THREADS=4 python test_numpy.py
FAST BLAS
version: 1.9.2
maxint: 9223372036854775807
dot: 0.0360922336578 sec
OMP_NUM_THREADS=8 python test_numpy.py
FAST BLAS
version: 1.9.2
maxint: 9223372036854775807
dot: 0.0364527702332 sec
scipy:
OMP_NUM_THREADS=1 python test_scipy.py
cholesky: 0.0276656150818 sec
svd: 0.732437372208 sec
OMP_NUM_THREADS=2 python test_scipy.py
cholesky: 0.0182101726532 sec
svd: 0.441690778732 sec
OMP_NUM_THREADS=4 python test_scipy.py
cholesky: 0.0130400180817 sec
svd: 0.316107988358 sec
OMP_NUM_THREADS=8 python test_scipy.py
cholesky: 0.012854385376 sec
svd: 0.315939807892 sec
Second installation without OpenBLAS:
numpy:
OMP_NUM_THREADS=1 python test_numpy.py
slow blas
version: 1.10.0.dev0+3c5409e
maxint: 9223372036854775807
dot: 0.0371072292328 sec
OMP_NUM_THREADS=2 python test_numpy.py
slow blas
version: 1.10.0.dev0+3c5409e
maxint: 9223372036854775807
dot: 0.0215149879456 sec
OMP_NUM_THREADS=4 python test_numpy.py
slow blas
version: 1.10.0.dev0+3c5409e
maxint: 9223372036854775807
dot: 0.0146862030029 sec
OMP_NUM_THREADS=8 python test_numpy.py
slow blas
version: 1.10.0.dev0+3c5409e
maxint: 9223372036854775807
dot: 0.0141334056854 sec
scipy:
OMP_NUM_THREADS=1 python test_scipy.py
cholesky: 0.0109382152557 sec
svd: 0.32529540062 sec
OMP_NUM_THREADS=2 python test_scipy.py
cholesky: 0.00988121032715 sec
svd: 0.331357002258 sec
OMP_NUM_THREADS=4 python test_scipy.py
cholesky: 0.00916676521301 sec
svd: 0.318637990952 sec
OMP_NUM_THREADS=8 python test_scipy.py
cholesky: 0.00931282043457 sec
svd: 0.324427986145 sec
To my surprise, the second case is faster than the first. In case of scipy there is no increase in performance after adding more cores, but even one core is faster than 4 cores in OpenBLAS.
Does anyone have an idea why is that?
There are two obvious differences that might account for the discrepancy:
You are comparing two different versions numpy. The OpenBLAS-linked version you installed using Homebrew is 1.9.1, whereas the one you built from source is 1.10.0.dev0+3c5409e.
Whilst the newer version is not linked against OpenBLAS, it is linked against Apple's Accelerate Framework, a different optimized BLAS implementation.
The reason why your test script still reports slow blas
for the second case is due to an incompatibility with the newest versions of numpy. The script you are using tests whether numpy is linked against an optimised BLAS library by checking for the presence of numpy.core._dotblas
:
try:
import numpy.core._dotblas
print 'FAST BLAS'
except ImportError:
print 'slow blas'
In older versions of numpy, this C module would only be compiled during the installation process if an optimized BLAS library was found. However, _dotblas
has been removed altogether in development versions > 1.10.0 (as mentioned in this previous SO question), so the script will always report slow blas
for these versions.
I've written an updated version of the numpy test script that reports the BLAS linkage correctly for the latest versions; you can find it here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With