Numpy: conditional sum

Tags:

I have the following numpy array:

import numpy as np
arr = np.array([[1,2,3,4,2000],
                [5,6,7,8,2000],
                [9,0,1,2,2001],
                [3,4,5,6,2001],
                [7,8,9,0,2002],
                [1,2,3,4,2002],
                [5,6,7,8,2003],
                [9,0,1,2,2003]
              ])

I understand np.sum(arr, axis=0) to provide the result:

Click to copy

array([   40,    28,    36,    34, 16012])

what I would like to do (without a for loop) is sum the columns based on the value of the last column so that the result provided is:

Click to copy

array([[   6,    8,   10,   12, 4000],
       [  12,    4,    6,    8, 4002],
       [   8,   10,   12,    4, 4004],
       [  14,    6,    8,   10, 4006]])

I realize that it may be a stretch to do without a loop, but hoping for the best...

If a for loop must be used, then how would that work?

I tried np.sum(arr[:, 4]==2000, axis=0) (where I would substitute 2000 with the variable from the for loop), however it gave a result of 2

635

asked May 01 '18 18:05

Infinity Cliff

2 Answers

You can do this in pure numpy using a clever application of np.diff and np.add.reduceat. np.diff will give you the indices where the rightmost column changes:

Click to copy

d = np.diff(arr[:, -1])

np.where will convert your boolean index d into the integer indices that np.add.reduceat expects:

Click to copy

d = np.where(d)[0]

reduceat will also expect to see a zero index, and everything needs to be shifted by one:

Click to copy

indices = np.r_[0, e + 1]

Using np.r_ here is a bit more convenient than np.concatenate because it allows scalars. The sum then becomes:

Click to copy

result = np.add.reduceat(arr, indices, axis=0)

This can be combined into a one-liner of course:

Click to copy

>>> result = np.add.reduceat(arr, np.r_[0, np.where(np.diff(arr[:, -1]))[0] + 1], axis=0)
>>> result
array([[   6,    8,   10,   12, 4000],
       [  12,    4,    6,    8, 4002],
       [   8,   10,   12,    4, 4004],
       [  14,    6,    8,   10, 4006]])

149

answered Sep 18 '22 01:09

Mad Physicist

I'm posting a simple solution with pandas and one with itertools

Click to copy

import pandas as pd
df = pd.DataFrame(arr)
x = df.groupby(4).sum().reset_index()[range(5)] #range(5) adjusts ordering 
x[4] *= 2
np.array(x)

array([[   6,    8,   10,   12, 4000],
       [  12,    4,    6,    8, 4002],
       [   8,   10,   12,    4, 4004],
       [  14,    6,    8,   10, 4006]])

You can also use itertools

Click to copy

np.array([sum(x[1]) for x in itertools.groupby(arr, key = lambda k: k[-1])])

array([[   6,    8,   10,   12, 4000],
       [  12,    4,    6,    8, 4002],
       [   8,   10,   12,    4, 4004],
       [  14,    6,    8,   10, 4006]])

answered Sep 22 '22 01:09

rafaelc

Related questions
                            
                                Shutdown dask workers from client or scheduler
                            
                                Cmd Windows "python" command works, but "python3" doesn't although my python version is 3.6
                            
                                React Flask Heroku App is not displaying frontend
                            
                                Modifying class __dict__ when shadowed by a property
                            
                                How to download this video using Selenium
                            
                                How do you recursively get all submodules in a python package?
                            
                                In Python 3.6, why does a negative number to the power of a fraction return nan when in a numpy array?
                            
                                Slice pandas dataframe json column into columns
                            
                                Is there a way to get the error in fitting parameters from scipy.stats.norm.fit?
                            
                                Save jaw only as image with dlib facial landmark detection and the rest to be transparent
                            
                                Django - Form across multiple views with progress saving
                            
                                How does the `my_input_fn` in the getting started with TensorFlow allow enumeration over the data?
                            
                                Google colaboratory run code locally
                            
                                Change training dataset every N epochs in Keras
                            
                                Activating python virtual environment does not switch to local versions of pip and python commands
                            
                                Is it possible to run multiple instances of one selenium test at once?
                            
                                Why python broadcasting in the example below is slower than a simple loop?
                            
                                Removing the white border around an image when using matplotlib without saving the image
                            
                                _pickle.PicklingError: Could not serialize object: TypeError: can't pickle _thread.RLock objects
                            
                                Efficiently find overlap of date-time ranges from 2 dataframes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Numpy: conditional sum

Tags:

python

arrays

numpy

sum

Infinity Cliff

People also ask

2 Answers

Mad Physicist

rafaelc

Recent Activity

Donate For Us