I have the following numpy array:
import numpy as np
arr = np.array([[1,2,3,4,2000],
[5,6,7,8,2000],
[9,0,1,2,2001],
[3,4,5,6,2001],
[7,8,9,0,2002],
[1,2,3,4,2002],
[5,6,7,8,2003],
[9,0,1,2,2003]
])
I understand np.sum(arr, axis=0)
to provide the result:
array([ 40, 28, 36, 34, 16012])
what I would like to do (without a for loop) is sum the columns based on the value of the last column so that the result provided is:
array([[ 6, 8, 10, 12, 4000],
[ 12, 4, 6, 8, 4002],
[ 8, 10, 12, 4, 4004],
[ 14, 6, 8, 10, 4006]])
I realize that it may be a stretch to do without a loop, but hoping for the best...
If a for loop must be used, then how would that work?
I tried np.sum(arr[:, 4]==2000, axis=0)
(where I would substitute 2000
with the variable from the for loop), however it gave a result of 2
sum() in Python. The numpy. sum() function is available in the NumPy package of Python. This function is used to compute the sum of all elements, the sum of each row, and the sum of each column of a given array.
To add the two arrays together, we will use the numpy. add(arr1,arr2) method. In order to use this method, you have to make sure that the two arrays have the same length. If the lengths of the two arrays are not the same, then broadcast the size of the shorter array by adding zero's at extra indexes.
Python numpy sum() function syntax The array elements are used to calculate the sum. If the axis is not provided, the sum of all the elements is returned. If the axis is a tuple of ints, the sum of all the elements in the given axes is returned. We can specify dtype to specify the returned output data type.
sum receives an array of booleans as its argument, it'll sum each element (count True as 1 and False as 0) and return the outcome. for instance np. sum([True, True, False]) will output 2 :) Hope this helps.
You can do this in pure numpy using a clever application of np.diff
and np.add.reduceat
. np.diff
will give you the indices where the rightmost column changes:
d = np.diff(arr[:, -1])
np.where
will convert your boolean index d
into the integer indices that np.add.reduceat
expects:
d = np.where(d)[0]
reduceat
will also expect to see a zero index, and everything needs to be shifted by one:
indices = np.r_[0, e + 1]
Using np.r_
here is a bit more convenient than np.concatenate
because it allows scalars. The sum then becomes:
result = np.add.reduceat(arr, indices, axis=0)
This can be combined into a one-liner of course:
>>> result = np.add.reduceat(arr, np.r_[0, np.where(np.diff(arr[:, -1]))[0] + 1], axis=0)
>>> result
array([[ 6, 8, 10, 12, 4000],
[ 12, 4, 6, 8, 4002],
[ 8, 10, 12, 4, 4004],
[ 14, 6, 8, 10, 4006]])
I'm posting a simple solution with pandas
and one with itertools
import pandas as pd
df = pd.DataFrame(arr)
x = df.groupby(4).sum().reset_index()[range(5)] #range(5) adjusts ordering
x[4] *= 2
np.array(x)
array([[ 6, 8, 10, 12, 4000],
[ 12, 4, 6, 8, 4002],
[ 8, 10, 12, 4, 4004],
[ 14, 6, 8, 10, 4006]])
You can also use itertools
np.array([sum(x[1]) for x in itertools.groupby(arr, key = lambda k: k[-1])])
array([[ 6, 8, 10, 12, 4000],
[ 12, 4, 6, 8, 4002],
[ 8, 10, 12, 4, 4004],
[ 14, 6, 8, 10, 4006]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With