Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy: conditional sum

I have the following numpy array:

import numpy as np
arr = np.array([[1,2,3,4,2000],
                [5,6,7,8,2000],
                [9,0,1,2,2001],
                [3,4,5,6,2001],
                [7,8,9,0,2002],
                [1,2,3,4,2002],
                [5,6,7,8,2003],
                [9,0,1,2,2003]
              ])

I understand np.sum(arr, axis=0) to provide the result:

array([   40,    28,    36,    34, 16012])

what I would like to do (without a for loop) is sum the columns based on the value of the last column so that the result provided is:

array([[   6,    8,   10,   12, 4000],
       [  12,    4,    6,    8, 4002],
       [   8,   10,   12,    4, 4004],
       [  14,    6,    8,   10, 4006]])

I realize that it may be a stretch to do without a loop, but hoping for the best...

If a for loop must be used, then how would that work?

I tried np.sum(arr[:, 4]==2000, axis=0) (where I would substitute 2000 with the variable from the for loop), however it gave a result of 2

like image 635
Infinity Cliff Avatar asked May 01 '18 18:05

Infinity Cliff


People also ask

How do I sum a specific column in NumPy?

sum() in Python. The numpy. sum() function is available in the NumPy package of Python. This function is used to compute the sum of all elements, the sum of each row, and the sum of each column of a given array.

How do I sum two NumPy arrays?

To add the two arrays together, we will use the numpy. add(arr1,arr2) method. In order to use this method, you have to make sure that the two arrays have the same length. If the lengths of the two arrays are​ not the same, then broadcast the size of the shorter array by adding zero's at extra indexes.

How do you sum all elements in a matrix in Python?

Python numpy sum() function syntax The array elements are used to calculate the sum. If the axis is not provided, the sum of all the elements is returned. If the axis is a tuple of ints, the sum of all the elements in the given axes is returned. We can specify dtype to specify the returned output data type.

What do you get if you apply NumPy sum () to a list that contains only Boolean values?

sum receives an array of booleans as its argument, it'll sum each element (count True as 1 and False as 0) and return the outcome. for instance np. sum([True, True, False]) will output 2 :) Hope this helps.


2 Answers

You can do this in pure numpy using a clever application of np.diff and np.add.reduceat. np.diff will give you the indices where the rightmost column changes:

d = np.diff(arr[:, -1])

np.where will convert your boolean index d into the integer indices that np.add.reduceat expects:

d = np.where(d)[0]

reduceat will also expect to see a zero index, and everything needs to be shifted by one:

indices = np.r_[0, e + 1]

Using np.r_ here is a bit more convenient than np.concatenate because it allows scalars. The sum then becomes:

result = np.add.reduceat(arr, indices, axis=0)

This can be combined into a one-liner of course:

>>> result = np.add.reduceat(arr, np.r_[0, np.where(np.diff(arr[:, -1]))[0] + 1], axis=0)
>>> result
array([[   6,    8,   10,   12, 4000],
       [  12,    4,    6,    8, 4002],
       [   8,   10,   12,    4, 4004],
       [  14,    6,    8,   10, 4006]])
like image 149
Mad Physicist Avatar answered Sep 18 '22 01:09

Mad Physicist


I'm posting a simple solution with pandas and one with itertools

import pandas as pd
df = pd.DataFrame(arr)
x = df.groupby(4).sum().reset_index()[range(5)] #range(5) adjusts ordering 
x[4] *= 2
np.array(x)

array([[   6,    8,   10,   12, 4000],
       [  12,    4,    6,    8, 4002],
       [   8,   10,   12,    4, 4004],
       [  14,    6,    8,   10, 4006]])

You can also use itertools

np.array([sum(x[1]) for x in itertools.groupby(arr, key = lambda k: k[-1])])

array([[   6,    8,   10,   12, 4000],
       [  12,    4,    6,    8, 4002],
       [   8,   10,   12,    4, 4004],
       [  14,    6,    8,   10, 4006]])
like image 42
rafaelc Avatar answered Sep 22 '22 01:09

rafaelc