Numpy aggregate into bins, then calculate sum?

Question

I have a matrix that looks like this:

M = [[1, 200],
 [1.8, 100],
 [2, 500],
 [2.5, 300],
 [3, 400],
 [3.5, 200],
 [5, 200],
 [8, 100]]

I want to group the rows by a bin size (applied to the left column), e.g. for a bin size 2 (first bin is values from 0-2, second bin from 2-4, third bin from 4-6 etc):

[[1, 200],
 [1.8, 100],
----
 [2, 500],
 [2.5, 300],
 [3, 400],
 [3.5, 200],
----
 [5, 200],
----
 [8, 100]]

Then output a new matrix with the sum of the right columns for each group:

[200+100, 500+300+400+200, 200, 100]

What is an efficient way to sum each value based on the bin_size boundaries?

ALollz · Accepted Answer

With `pandas`:

Make a DataFrame and then use integer division to define your bins:

import pandas as pd

df = pd.DataFrame(M)
df.groupby(df[0]//2)[1].sum()

#0
#0.0     300
#1.0    1400
#2.0     200
#4.0     100
#Name: 1, dtype: int64

Use .tolist() to get your desired output:

df.groupby(df[0]//2)[1].sum().tolist()
#[300, 1400, 200, 100]

With `numpy.bincount`

import numpy as np

gp, vals = np.transpose(M)
gp = (gp//2).astype(int)

np.bincount(gp, vals)
#array([ 300., 1400.,  200.,    0.,  100.])

user3483203 · Answer

You can make use of np.digitize and a scipy.sparse.csr_matrix here:

bins = [2, 4, 6, 8, 10]
b = np.digitize(M[:, 0], bins)
v = M[:, 1]

Now using a vectorized groupby using a csr_matrix

from scipy import sparse

sparse.csr_matrix(
    (v, b, np.arange(v.shape[0]+1)), (v.shape[0], b.max()+1)
).sum(0)

matrix([[ 300., 1400.,  200.,    0.,  100.]])

Numpy aggregate into bins, then calculate sum?

Tags:

python

python-3.x

pandas

numpy

Franc Weser

2 Answers

With `pandas`:

With `numpy.bincount`

ALollz

user3483203

Recent Activity

Donate For Us

Numpy aggregate into bins, then calculate sum?

Tags:

python

python-3.x

pandas

numpy

Franc Weser

2 Answers

With pandas:

With numpy.bincount

ALollz

user3483203

Related questions

Recent Activity

Donate For Us

With `pandas`:

With `numpy.bincount`