Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does python/numpy's += mutate the original array?

Tags:

python

numpy

import numpy as np

W = np.array([0,1,2])
W1 = W
W1 += np.array([2,3,4])
print W

W = np.array([0,1,2])
W1 = W
W1 = W1 + np.array([2,3,4])
print W

The upper code will mutate W, but the lower code will not mutate W. Why?

like image 870
henry Avatar asked Mar 10 '16 07:03

henry


2 Answers

This is true for almost any type of collection. This is simply due to the way python treats variables. var1 += var2 is not the same as var1 = var1 + var2 with collections. I'll explain it as far as I understand it, which can certainly be improved, so any edits/criticisms are welcomed.

print("1:")
x1 = [7]
y1 = x1
y1 += [3]
print("{} {}".format(x1, id(x1)))
print("{} {}".format(y1, id(y1)))

print("2:")
x2 = [7]
y2 = x2
y2 = y2 + [3]
print("{} {}".format(x2, id(x2)))
print("{} {}".format(y2, id(y2)))

Output:

1:
[7, 3] 40229784 # first id
[7, 3] 40229784 # same id
2:
[7]    40228744 # first id
[7, 3] 40230144 # new id

Saying var1 = var1 + var2 creates a new object with a new ID. It takes the old value, adds it to the 2nd variable, and assigns it to a new object with the NAME of the first object. In the var1 += var2 example, it simply appends it to the object pointed at by the ID, which is the same as the old variable.

like image 198
Goodies Avatar answered Oct 05 '22 22:10

Goodies


In Python, the operator + (addition) redirects to either the __add__ method of the left operand or the __radd__ method of the right operand. We can ignore the latter case, since it is pretty rarely used (when addition does not commute).

The += operator redirects to the __iadd__ method, if one is defined. If __iadd__ is not defined on the left operand, a += b becomes equivalent to a = a + b.

The thing to remember with a += b is that it is not just a.__iadd__(b) (or type(a).__iadd__(a, b)), it is a = type(a).__iadd__(a, b). On the one hand, this forced assignment allows immutable types like int to define a meaningful += operation. On the other hand, the following fails with a TypeError even though list addition happens in place:

tup = (['a'], ['b'])
tup[0] += ['c']

Numpy arrays are mutable objects that have clearly defined in place operations. If a and b are arrays of the same shape, a += b adds the two arrays together, using a as an output buffer. The function form of the operation is a = np.ndarray.__iadd__(a, b), which modifies a and returns a.

Similarly, a = a + b is equivalent to a = np.ndarray.__add__(a, b). Unlike __iadd__, however, __add__ creates and returns a completely new array to hold the result, which is then assigned to a.

This has some additional implications for things like output type. If a has dtype=int32 and b has dtype=float64, the in place operation will not change the type of a. Instead b's values will be truncated. The result of a + b will have the wider type though, which would be float64 in this example.

All the basic Python operators have equivalent function implementations in numpy. a = a + b is equivalent to a = np.add(a, b). a += b is equivalent to a = np.add(a, b, out=a).

like image 36
Mad Physicist Avatar answered Oct 05 '22 20:10

Mad Physicist