Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

performance of NumPy with different BLAS implementations

I'm running an algorithm that is implemented in Python and uses NumPy. The most computationally expensive part of the algorithm involves solving a set of linear systems (i.e. a call to numpy.linalg.solve(). I came up with this small benchmark:

import numpy as np
import time

# Create two large random matrices
a = np.random.randn(5000, 5000)
b = np.random.randn(5000, 5000)

t1 = time.time()
# That's the expensive call:
np.linalg.solve(a, b)
print time.time() - t1

I've been running this on:

  1. My laptop, a late 2013 MacBook Pro 15" with 4 cores at 2GHz (sysctl -n machdep.cpu.brand_string gives me Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz)
  2. An Amazon EC2 c3.xlarge instance, with 4 vCPUs. Amazon advertises them as "High Frequency Intel Xeon E5-2680 v2 (Ivy Bridge) Processors"

Bottom line:

  • On the Mac it runs in ~4.5 seconds
  • On the EC2 instance it runs in ~19.5 seconds

I have tried it also on other OpenBLAS / Intel MKL based setups, and the runtime is always comparable to what I get on the EC2 instance (modulo the hardware config.)

Can anyone explain why the performance on Mac (with the Accelerate Framework) is > 4x better? More details about the NumPy / BLAS setup in each are provided below.

Laptop setup

numpy.show_config() gives me:

atlas_threads_info:
  NOT AVAILABLE
blas_opt_info:
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
    define_macros = [('NO_ATLAS_INFO', 3)]
atlas_blas_threads_info:
  NOT AVAILABLE
openblas_info:
  NOT AVAILABLE
lapack_opt_info:
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    extra_compile_args = ['-msse3']
    define_macros = [('NO_ATLAS_INFO', 3)]
atlas_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
atlas_blas_info:
  NOT AVAILABLE
mkl_info:
  NOT AVAILABLE

EC2 instance setup:

On Ubuntu 14.04, I installed OpenBLAS with

sudo apt-get install libopenblas-base libopenblas-dev

When installing NumPy, I created a site.cfg with the following contents:

[default]
library_dirs= /usr/lib/openblas-base

[atlas]
atlas_libs = openblas

numpy.show_config() gives me:

atlas_threads_info:
    libraries = ['lapack', 'openblas']
    library_dirs = ['/usr/lib']
    define_macros = [('ATLAS_INFO', '"\\"None\\""')]
    language = f77
    include_dirs = ['/usr/include/atlas']
blas_opt_info:
    libraries = ['openblas']
    library_dirs = ['/usr/lib']
    language = f77
openblas_info:
    libraries = ['openblas']
    library_dirs = ['/usr/lib']
    language = f77
lapack_opt_info:
    libraries = ['lapack', 'openblas']
    library_dirs = ['/usr/lib']
    define_macros = [('ATLAS_INFO', '"\\"None\\""')]
    language = f77
    include_dirs = ['/usr/include/atlas']
openblas_lapack_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
mkl_info:
  NOT AVAILABLE
like image 995
lum Avatar asked Oct 22 '14 15:10

lum


1 Answers

The reason for this behavior could be that Accelerate uses multithreading, while the others don't.

Most BLAS implementations follow the environment variable OMP_NUM_THREADS to determine how many threads to use. I believe they only use 1 thread if not told otherwise explicitly. Accelerate's man page, however sounds like threading is turned on by default; it can be turned off by setting the environment variable VECLIB_MAXIMUM_THREADS.

To determine if this is really what's happening, try

export VECLIB_MAXIMUM_THREADS=1

before calling the Accelerate version, and

export OMP_NUM_THREADS=4

for the other versions.

Independent of whether this is really the reason, it's a good idea to always set these variables when you use BLAS to be sure you control what is going on.

like image 135
Elmar Peise Avatar answered Oct 08 '22 01:10

Elmar Peise