On my Anaconda Python distribution, copying a Numpy array that is exactly 16 GB or larger (regardless of dtype) sets all elements of the copy to 0:
>>> np.arange(2 ** 31 - 1).copy() # works fine
array([ 0, 1, 2, ..., 2147483644, 2147483645,
2147483646])
>>> np.arange(2 ** 31).copy() # wait, what?!
array([0, 0, 0, ..., 0, 0, 0])
>>> np.arange(2 ** 32 - 1, dtype=np.float32).copy()
array([ 0.00000000e+00, 1.00000000e+00, 2.00000000e+00, ...,
4.29496730e+09, 4.29496730e+09, 4.29496730e+09], dtype=float32)
>>> np.arange(2 ** 32, dtype=np.float32).copy()
array([ 0., 0., 0., ..., 0., 0., 0.], dtype=float32)
Here is np.__config__.show()
for this distribution:
blas_opt_info:
library_dirs = ['/users/username/.anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/users/username/.anaconda3/include']
libraries = ['mkl_rt', 'pthread']
lapack_opt_info:
library_dirs = ['/users/username/.anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/users/username/.anaconda3/include']
libraries = ['mkl_rt', 'pthread']
mkl_info:
library_dirs = ['/users/username/.anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/users/username/.anaconda3/include']
libraries = ['mkl_rt', 'pthread']
openblas_lapack_info:
NOT AVAILABLE
lapack_mkl_info:
library_dirs = ['/users/username/.anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/users/username/.anaconda3/include']
libraries = ['mkl_rt', 'pthread']
blas_mkl_info:
library_dirs = ['/users/username/.anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/users/username/.anaconda3/include']
libraries = ['mkl_rt', 'pthread']
For comparison, here is np.__config__.show()
for my system Python distribution, which does not have this problem:
blas_opt_info:
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
openblas_lapack_info:
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
openblas_info:
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
lapack_opt_info:
define_macros = [('HAVE_CBLAS', None)]
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
blas_mkl_info:
NOT AVAILABLE
I'm wondering if the MKL acceleration is the problem. I've reproduced the bug on both Python 2 and 3.
This is just a guess. I don't have any evidence supporting the following claims at the moment but my guess is that this is a simple overflow problem:
>>> np.arange(2 ** 31 - 1).size
2147483647
Which just happens to be the largest int32
value:
>>> np.iinfo(np.int32)
iinfo(min=-2147483648, max=2147483647, dtype=int32)
So when you actually have an array with a size of 2147483648
(2**31
) and use an int32 this would overflow and give an actual negative value. Then there is probably something like this inside the numpy.ndarray.copy
method:
for (i = 0 ; i < size ; i ++) {
newarray[i] = oldarray[i]
}
But given that the size is now negative the loop wouldn't execute because 0 > -2147483648
.
That the new array is actually initialized with zeros is strange because it wouldn't make sense to actually put zeros before one copies the array (but it could be something like in this question).
Again: That's just guessing at this point but it would match the behaviour.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With