So, in numpy 1.8.2 (with python 2.7.6) there seems to be an issue in array division. When performing in-place division of a sufficiently large array (at least 8192 elements, more than one dimension, data type is irrelevant) with a part of itself, behaviour is inconsistent for different notations.
import numpy as np
arr = np.random.rand(2, 5000)
arr_copy = arr.copy()
arr_copy = arr_copy / arr_copy[0]
arr /= arr[0]
print np.sum(arr != arr_copy), arr.size - np.sum(np.isclose(arr, arr_copy))
The output is expected to be 0, as the two divisions should be consistent, but it is 1808. Is this a bug? Is it also happening in other numpy versions?
It's not really a bug, as is to do with buffer size as you suggest in the question. Setting the buffer size larger gets rid of the problem (for now...):
>>> np.setbufsize(8192*4) # sets new buffer size, returns current size
8192
>>> # same set up as in the question
>>> np.sum(arr != arr_copy), arr.size - np.sum(np.isclose(arr, arr_copy))
(0, 0)
And as you state in the comment, the inplace division arr /= arr[0]
is where this originally goes wrong. Only the first 8192 elements of arr
are buffered with arr[0]
simply being a view of the first row of arr
.
This means that all 5000 values in the first row will be correctly divided by themselves, and the second row will will also be correct up to index 3192. Next the remaining 1808 values are put into the buffer for the inplace division but the first row has already changed: arr[0]
is now simply a view of a row of ones, so the values in the latter columns will just be divided by one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With