I try to evalaute the performance of numpy linked to ATLAS compared to numpy linked to OpenBLAS. I get some strange results for ATLAS which I describe below.
The Python code for evaluating matrix-matrix multiplication (aka sgemm) looks like this:
import sys
sys.path.insert(0, "numpy-1.8.1")
import numpy
import timeit
for i in range(100, 501, 100):
setup = "import numpy; m1 = numpy.random.rand(%d, %d).astype(numpy.float32)" % (i, i)
timer = timeit.Timer("numpy.dot(m1, m1)", setup)
times = timer.repeat(100, 1)
print "%3d" % i,
print "%7.4f" % numpy.mean(times),
print "%7.4f" % numpy.min(times),
print "%7.4f" % numpy.max(times)
If I run this script with numpy linked to ATLAS I get large variations in the measured time. You see the matrix size in the frist column, followed by mean, min and max of execution times gained by running the matrix matrix multiplication 100 fold:
100 0.0003 0.0003 0.0004
200 0.0023 0.0010 0.0073
300 0.0052 0.0026 0.0178
400 0.0148 0.0066 0.0283
500 0.0295 0.0169 0.0531
If I repeat this procedure with numpy linked to OpenBLAS using one thread the running times are much more stable:
100 0.0002 0.0002 0.0003
200 0.0014 0.0014 0.0015
300 0.0044 0.0044 0.0047
400 0.0102 0.0101 0.0105
500 0.0169 0.0168 0.0177
Can anybody explane this observation ?
Edit: Additional information:
The oberved min and max values for ATLAS are no outliers, the times are distributed over the given range.
I uploaded ATALS times for i=500 at https://gist.github.com/uweschmitt/768bd165477d7c14095e
The given times come from a different run, so avg, min and max values differ slightly.
Edit: Additional finding:
May CPU Throttling (http://www.scipy.org/scipylib/building/linux.html#step-1-disable-cpu-throttling) be the cause ? I do not know enough about CPU throtting in order to judge its impact on my measurements. Regrettably I can not set / unset it on my target machine.
I cannot reproduce, but I think I know the reason. I am using Numpy 1.8.1 on a Linux 64 box.
First, my results with ATLAS (I have added the standard deviation in the last column):
100 0.0003 0.0002 0.0025 0.0003
200 0.0012 0.0010 0.0067 0.0006
300 0.0028 0.0026 0.0047 0.0004
400 0.0070 0.0059 0.0089 0.0004
500 0.0122 0.0109 0.0149 0.0009
And now, the results with MKL provided by Anaconda:
100 0.0003 0.0001 0.0155 0.0015
200 0.0005 0.0005 0.0006 0.0000
300 0.0018 0.0017 0.0021 0.0001
400 0.0039 0.0038 0.0042 0.0001
500 0.0079 0.0077 0.0084 0.0002
MKL is faster, but the spread is consistent.
ATLAS is tuned at compile time, it will try different configurations and algorithms and keep the fastest for your particular set of hardware. If you install a precompiled version, you are using the optimal configuration for the building machine, not for yours. This misconfiguration is the probable cause of the spread. In my case, I have compiled ATLAS myself.
On the contrary, OpenBLAS is hand tuned to the specific architecture, so any binary install will be equivalent. MKL decides dynamically.
This is what happens if I run the script on Numpy installed from the repositories and linked with a pre-compiled ATLAS (SSE3 not activated):
100 0.0007 0.0003 0.0064 0.0007
200 0.0021 0.0015 0.0090 0.0009
300 0.0050 0.0040 0.0114 0.0010
400 0.0113 0.0101 0.0186 0.0011
500 0.0217 0.0192 0.0329 0.0020
These numbers are more similar to your data.
For completeness, I aksed a friend to run the snippet on her machine, that has numpy installed from Ubuntu repositories and no ATLAS, so Numpy is falling back to its crappy default:
100 0.0007 0.0007 0.0008 0.0000
200 0.0058 0.0053 0.0107 0.0014
300 0.0178 0.0175 0.0188 0.0003
400 0.0418 0.0401 0.0528 0.0014
500 0.0803 0.0797 0.0818 0.0004
So, what may be happening?
You have a non optimal installation of ATLAS, and that is why you get such a scatter. My numbers were run on a Intel i5 CPU @ 1.7 GHz on a laptop. I don't know which machine you have, but I doubt it is almost three times slower than mine. This suggest ATLAS is not fully optimised.
How can I be sure?
Running numpy.show_config()
will tell you which libraries it is linked to, and where they are. The output is something like this:
atlas_threads_info:
libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
library_dirs = ['/usr/lib64/atlas-sse3']
define_macros = [('ATLAS_INFO', '"\\"3.8.4\\""')]
language = f77
include_dirs = ['/usr/include']
blas_opt_info:
If this is true, how to fix it?
You may have a stale precompiled binary atlas (it is a dependency for some packages), or the flags you used to compile it are wrong. The smoothest solution is to build the RMPS from source. Here are instructions for CentOS.
Note that OpenBLAS is not compatible (yet) with multiprocessing
, so be aware of the limitations. If you are very heavy on linear algebra, MKL is the best option, but it is expensive. Academics can get it for free from Continuum Anaconda Python distribution, and many universities have a campus-wide licence.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With