Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy data type casting behaves differently in x=x+a and x+=a

I noticed some differences between the operations between x=x+a and x+=a when manipulating some numpy arrays in python.

What I was trying to do is simply adding some random errors to an integer list, like this:

x=numpy.arange(12)
a=numpy.random.random(size=12)
x+=a

but printing out x gives an integer list [0,1,2,3,4,5,6,7,8,9,10,11].

It turns out that if I use x=x+a instead, it works as expected.

Is that something we should be aware of, I mean it behaves so differently. I used to think that it is totally equivalent between x+=a and x=x+a and I have been using them interchangeably without paying attention all the time. Now I am so concerned and anxious about all the computations I have done so far. Who knows when and where this has been creating a problem and I have to go through everything to double check.

Is this a bug in numpy? I have tested in numpy version 1.2.0 and 1.6.1 and they both did this.

like image 239
Jason Avatar asked May 01 '14 12:05

Jason


People also ask

Can NumPy hold different data types?

Data Types in NumPyNumPy has some extra data types, and refer to data types with one character, like i for integers, u for unsigned integers etc. Below is a list of all data types in NumPy and the characters used to represent them.

What are the NumPy data types?

There are 5 basic numerical types representing booleans (bool), integers (int), unsigned integers (uint) floating point (float) and complex.

How can you identify the datatype of a given NumPy array?

Creating numpy array by using an array function array(). This function takes argument dtype that allows us to define the expected data type of the array elements: Example 1: Python3.

What is Dtype (' U32 ') Python?

dtype='<U32' is a little-endian 32 character string. The documentation on dtypes goes into more depth about each of the character. 'U' Unicode string. Several kinds of strings can be converted.


2 Answers

No, this is not a bug, this is intended behavior. += does an in-place addition, so it can't change the data type of the array x. When the dtype is integral, that means the floating-point temporaries resulting from adding in the elements of a get truncated to integers. Since np.random.random returns floats in the range [0, 1), the result is always truncated back to the values in x.

By contrast, x + a needs to allocate a new array anyway, and upcasts the dtype of that new array to float when one argument is float and the other is integral.

The best way to avoid this problem is to be explicit about the required dtype in the arange call:

x = np.arange(12, dtype=float)
x += np.random.random(size=12)

(Note that x += a and x = x + a are seldom equivalent in Python, since the latter typically modifies the object pointed to by x. E.g. with pure Python lists:

a = []
b = a
a += [1]

modifies b as well, while a = a + [1] would leave b untouched.)

like image 88
Fred Foo Avatar answered Nov 08 '22 20:11

Fred Foo


x += a modifies x in-place: data will be cast to int on assignment. x = x + a will assign the result of x + a to the label x, and in this case x + a will promote to a float.

like image 25
YXD Avatar answered Nov 08 '22 19:11

YXD