Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do numpy's in-place operations (e.g. `+=`) work?

Tags:

python

numpy

The basic question is: What happens under the hood when doing: a[i] += b?

Given the following:

import numpy as np
a = np.arange(4)
i = a > 0
i
= array([False,  True,  True,  True], dtype=bool)

I understand that:

  • a[i] = x is the same as a.__setitem__(i, x), which assigns directly to the items indicated by i
  • a += x is the same as a.__iadd__(x), which does the addition in place

But what happens when I do:

a[i] += x

Specifically:

  1. Is this the same as a[i] = a[i] + x? (which is not an in-place operation)
  2. Does it make a difference in this case if i is:
    • an int index, or
    • an ndarray, or
    • a slice object

Background

The reason I started delving into this is that I encountered a non-intuitive behavior when working with duplicate indices:

a = np.zeros(4)
x = np.arange(4)
indices = np.zeros(4,dtype=np.int)  # duplicate indices
a[indices] += x
a
= array([ 3.,  0.,  0.,  0.])

More interesting stuff about duplicate indices in this question.

like image 918
shx2 Avatar asked Apr 16 '13 10:04

shx2


People also ask

What are the operations that can be performed in NumPy?

Numpy with Python Input arrays for performing arithmetic operations such as add(), subtract(), multiply(), and divide() must be either of the same shape or should conform to array broadcasting rules.

What is NumPy operation?

In addition to arithmetic operators, Numpy also provides functions to perform arithmetic operations. You can use functions like add , subtract , multiply , divide to perform array operations.

What is the difference between NP dot and NP Matmul?

The matmul() function broadcasts the array like a stack of matrices as elements residing in the last two indexes, respectively. The numpy. dot() function, on the other hand, performs multiplication as the sum of products over the last axis of the first array and the second-to-last of the second.

What is NumPy in Python with example?

NumPy is a Python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform, and matrices. NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely. NumPy stands for Numerical Python.


4 Answers

The first thing you need to realise is that a += x doesn't map exactly to a.__iadd__(x), instead it maps to a = a.__iadd__(x). Notice that the documentation specifically says that in-place operators return their result, and this doesn't have to be self (although in practice, it usually is). This means a[i] += x trivially maps to:

a.__setitem__(i, a.__getitem__(i).__iadd__(x))

So, the addition technically happens in-place, but only on a temporary object. There is still potentially one less temporary object created than if it called __add__, though.

like image 99
lvc Avatar answered Nov 02 '22 10:11

lvc


Actually that has nothing to do with numpy. There is no "set/getitem in-place" in python, these things are equivalent to a[indices] = a[indices] + x. Knowing that, it becomes pretty obvious what is going on. (EDIT: As lvc writes, actually the right hand side is in place, so that it is a[indices] = (a[indices] += x) if that was legal syntax, that has largly the same effect though)

Of course a += x actually is in-place, by mapping a to the np.add out argument.

It has been discussed before and numpy cannot do anything about it as such. Though there is an idea to have a np.add.at(array, index_expression, x) to at least allow such operations.

like image 24
seberg Avatar answered Nov 02 '22 10:11

seberg


I don't know what's going on under the hood, but in-place operations on items in NumPy arrays and in Python lists will return the same reference, which IMO can lead to confusing results when passed into a function.

Start with Python

>>> a = [1, 2, 3]
>>> b = a
>>> a is b
True
>>> id(a[2])
12345
>>> id(b[2])
12345

... where 12345 is a unique id for the location of the value at a[2] in memory, which is the same as b[2].

So a and b refer to the same list in memory. Now try in-place addition on an item in the list.

>>> a[2] += 4
>>> a
[1, 2, 7]
>>> b
[1, 2, 7]
>>> a is b
True
>>> id(a[2])
67890
>>> id(b[2])
67890

So in-place addition of the item in the list only changed the value of the item at index 2, but a and b still reference the same list, although the 3rd item in the list was reassigned to a new value, 7. The reassignment explains why if a = 4 and b = a were integers (or floats) instead of lists, then a += 1 would cause a to be reassigned, and then b and a would be different references. However, if list addition is called, eg: a += [5] for a and b referencing the same list, it does not reassign a; they will both be appended.

Now for NumPy

>>> import numpy as np
>>> a = np.array([1, 2, 3], float)
>>> b = a
>>> a is b
True

Again these are the same reference, and in-place operators seem have the same effect as for list in Python:

>>> a += 4
>>> a
array([ 5.,  6.,  7.])
>>> b
array([ 5.,  6.,  7.])

In place addition of an ndarray updates the reference. This is not the same as calling numpy.add which creates a copy in a new reference.

>>> a = a + 4
>>> a
array([  9.,  10.,  11.])
>>> b
array([ 5.,  6.,  7.])

In-place operations on borrowed references

I think the danger here is if the reference is passed to a different scope.

>>> def f(x):
...     x += 4
...     return x

The argument reference to x is passed into the scope of f which does not make a copy and in fact changes the value at that reference and passes it back.

>>> f(a)
array([ 13.,  14.,  15.])
>>> f(a)
array([ 17.,  18.,  19.])
>>> f(a)
array([ 21.,  22.,  23.])
>>> f(a)
array([ 25.,  26.,  27.])

The same would be true for a Python list as well:

>>> def f(x, y):
...     x += [y]

>>> a = [1, 2, 3]
>>> b = a
>>> f(a, 5)
>>> a
[1, 2, 3, 5]
>>> b
[1, 2, 3, 5]

IMO this can be confusing and sometimes difficult to debug, so I try to only use in-place operators on references that belong to the current scope, and I try be careful of borrowed references.

like image 32
Mark Mikofski Avatar answered Nov 02 '22 10:11

Mark Mikofski


As Ivc explains, there is no in-place item add method, so under the hood it uses __getitem__, then __iadd__, then __setitem__. Here's a way to empirically observe that behavior:

import numpy

class A(numpy.ndarray):
    def __getitem__(self, *args, **kwargs):
        print("getitem")
        return numpy.ndarray.__getitem__(self, *args, **kwargs)
    def __setitem__(self, *args, **kwargs):
        print("setitem")
        return numpy.ndarray.__setitem__(self, *args, **kwargs)
    def __iadd__(self, *args, **kwargs):
        print("iadd")
        return numpy.ndarray.__iadd__(self, *args, **kwargs)

a = A([1,2,3])
print("about to increment a[0]")
a[0] += 1

It prints

about to increment a[0]
getitem
iadd
setitem
like image 22
Mr Fooz Avatar answered Nov 02 '22 09:11

Mr Fooz