Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected behaviour in numpy, when dividing arrays

So, in numpy 1.8.2 (with python 2.7.6) there seems to be an issue in array division. When performing in-place division of a sufficiently large array (at least 8192 elements, more than one dimension, data type is irrelevant) with a part of itself, behaviour is inconsistent for different notations.

import numpy as np
arr = np.random.rand(2, 5000)
arr_copy = arr.copy()
arr_copy = arr_copy / arr_copy[0]
arr /= arr[0]
print np.sum(arr != arr_copy), arr.size - np.sum(np.isclose(arr, arr_copy))

The output is expected to be 0, as the two divisions should be consistent, but it is 1808. Is this a bug? Is it also happening in other numpy versions?

like image 909
Dschoni Avatar asked Nov 12 '15 15:11

Dschoni


1 Answers

It's not really a bug, as is to do with buffer size as you suggest in the question. Setting the buffer size larger gets rid of the problem (for now...):

>>> np.setbufsize(8192*4) # sets new buffer size, returns current size
8192 
>>> # same set up as in the question
>>> np.sum(arr != arr_copy), arr.size - np.sum(np.isclose(arr, arr_copy))
(0, 0)

And as you state in the comment, the inplace division arr /= arr[0] is where this originally goes wrong. Only the first 8192 elements of arr are buffered with arr[0] simply being a view of the first row of arr.

This means that all 5000 values in the first row will be correctly divided by themselves, and the second row will will also be correct up to index 3192. Next the remaining 1808 values are put into the buffer for the inplace division but the first row has already changed: arr[0] is now simply a view of a row of ones, so the values in the latter columns will just be divided by one.

like image 78
Alex Riley Avatar answered Nov 08 '22 05:11

Alex Riley