Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does copying a >= 16 GB Numpy array set all its elements to 0?

On my Anaconda Python distribution, copying a Numpy array that is exactly 16 GB or larger (regardless of dtype) sets all elements of the copy to 0:

>>> np.arange(2 ** 31 - 1).copy()  # works fine
array([         0,          1,          2, ..., 2147483644, 2147483645,
       2147483646])
>>> np.arange(2 ** 31).copy()  # wait, what?!
array([0, 0, 0, ..., 0, 0, 0])
>>> np.arange(2 ** 32 - 1, dtype=np.float32).copy()
array([  0.00000000e+00,   1.00000000e+00,   2.00000000e+00, ...,
         4.29496730e+09,   4.29496730e+09,   4.29496730e+09], dtype=float32)
>>> np.arange(2 ** 32, dtype=np.float32).copy()
array([ 0.,  0.,  0., ...,  0.,  0.,  0.], dtype=float32)

Here is np.__config__.show() for this distribution:

blas_opt_info:
    library_dirs = ['/users/username/.anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/users/username/.anaconda3/include']
    libraries = ['mkl_rt', 'pthread']
lapack_opt_info:
    library_dirs = ['/users/username/.anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/users/username/.anaconda3/include']
    libraries = ['mkl_rt', 'pthread']
mkl_info:
    library_dirs = ['/users/username/.anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/users/username/.anaconda3/include']
    libraries = ['mkl_rt', 'pthread']
openblas_lapack_info:
  NOT AVAILABLE
lapack_mkl_info:
    library_dirs = ['/users/username/.anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/users/username/.anaconda3/include']
    libraries = ['mkl_rt', 'pthread']
blas_mkl_info:
    library_dirs = ['/users/username/.anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/users/username/.anaconda3/include']
    libraries = ['mkl_rt', 'pthread']

For comparison, here is np.__config__.show() for my system Python distribution, which does not have this problem:

blas_opt_info:
    define_macros = [('HAVE_CBLAS', None)]
    libraries = ['openblas', 'openblas']
    language = c
    library_dirs = ['/usr/local/lib']
openblas_lapack_info:
    define_macros = [('HAVE_CBLAS', None)]
    libraries = ['openblas', 'openblas']
    language = c
    library_dirs = ['/usr/local/lib']
openblas_info:
    define_macros = [('HAVE_CBLAS', None)]
    libraries = ['openblas', 'openblas']
    language = c
    library_dirs = ['/usr/local/lib']
lapack_opt_info:
    define_macros = [('HAVE_CBLAS', None)]
    libraries = ['openblas', 'openblas']
    language = c
    library_dirs = ['/usr/local/lib']
blas_mkl_info:
  NOT AVAILABLE

I'm wondering if the MKL acceleration is the problem. I've reproduced the bug on both Python 2 and 3.

like image 940
1'' Avatar asked Feb 05 '17 07:02

1''


1 Answers

This is just a guess. I don't have any evidence supporting the following claims at the moment but my guess is that this is a simple overflow problem:

>>> np.arange(2 ** 31 - 1).size
2147483647

Which just happens to be the largest int32 value:

>>> np.iinfo(np.int32)
iinfo(min=-2147483648, max=2147483647, dtype=int32)

So when you actually have an array with a size of 2147483648 (2**31) and use an int32 this would overflow and give an actual negative value. Then there is probably something like this inside the numpy.ndarray.copy method:

for (i = 0 ; i < size ; i ++) {
    newarray[i] = oldarray[i]
}

But given that the size is now negative the loop wouldn't execute because 0 > -2147483648.

That the new array is actually initialized with zeros is strange because it wouldn't make sense to actually put zeros before one copies the array (but it could be something like in this question).

Again: That's just guessing at this point but it would match the behaviour.

like image 51
MSeifert Avatar answered Oct 24 '22 01:10

MSeifert