Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy sum elements in array based on its value

I have unsorted array of indexes:

i = np.array([1,5,2,6,4,3,6,7,4,3,2])

I also have an array of values of the same length:

v = np.array([2,5,2,3,4,1,2,1,6,4,2])

I have array with zeros of desired values:

d = np.zeros(10)

Now I want to add to elements in d values of v based on it's index in i.

If I do it in plain python I would do it like this:

for index,value in enumerate(v):
    idx = i[index]
    d[idx] += v[index]

It is ugly and inefficient. How can I change it?

like image 291
Mihail Kondratyev Avatar asked Nov 20 '15 02:11

Mihail Kondratyev


People also ask

How do you sum an element of a numpy array?

The numpy. sum() function is available in the NumPy package of Python. This function is used to compute the sum of all elements, the sum of each row, and the sum of each column of a given array. Essentially, this sum ups the elements of an array, takes the elements within a ndarray, and adds them together.

What is the difference between NP sum and sum?

sum performs faster for np. array objects, whereas sum performs faster for list objects.

What do you get if you apply Numpy sum () to a list that contains only Boolean values?

sum receives an array of booleans as its argument, it'll sum each element (count True as 1 and False as 0) and return the outcome. for instance np. sum([True, True, False]) will output 2 :) Hope this helps.

How do you sum all elements in a matrix in Python?

Python numpy sum() function syntax The array elements are used to calculate the sum. If the axis is not provided, the sum of all the elements is returned. If the axis is a tuple of ints, the sum of all the elements in the given axes is returned. We can specify dtype to specify the returned output data type.


2 Answers

np.add.at(d, i, v)

You'd think d[i] += v would work, but if you try to do multiple additions to the same cell that way, one of them overrides the others. The ufunc.at method avoids those problems.

like image 88
user2357112 supports Monica Avatar answered Oct 14 '22 15:10

user2357112 supports Monica


We can use np.bincount which is supposedly pretty efficient for such accumulative weighted counting, so here's one with that -

counts = np.bincount(i,v)
d[:counts.size] = counts

Alternatively, using minlength input argument and for a generic case when d could be any array and we want to add into it -

d += np.bincount(i,v,minlength=d.size).astype(d.dtype, copy=False)

Runtime tests

This section compares np.add.at based approach listed in the other post with the np.bincount based one listed earlier in this post.

In [61]: def bincount_based(d,i,v):
    ...:     counts = np.bincount(i,v)
    ...:     d[:counts.size] = counts
    ...: 
    ...: def add_at_based(d,i,v):
    ...:     np.add.at(d, i, v)
    ...:     

In [62]: # Inputs (random numbers)
    ...: N = 10000
    ...: i = np.random.randint(0,1000,(N))
    ...: v = np.random.randint(0,1000,(N))
    ...: 
    ...: # Setup output arrays for two approaches
    ...: M = 12000
    ...: d1 = np.zeros(M)
    ...: d2 = np.zeros(M)
    ...: 

In [63]: bincount_based(d1,i,v) # Run approaches
    ...: add_at_based(d2,i,v)
    ...: 

In [64]: np.allclose(d1,d2)  # Verify outputs
Out[64]: True

In [67]: # Setup output arrays for two approaches again for timing
    ...: M = 12000
    ...: d1 = np.zeros(M)
    ...: d2 = np.zeros(M)
    ...: 

In [68]: %timeit add_at_based(d2,i,v)
1000 loops, best of 3: 1.83 ms per loop

In [69]: %timeit bincount_based(d1,i,v)
10000 loops, best of 3: 52.7 µs per loop
like image 21
Divakar Avatar answered Oct 14 '22 15:10

Divakar