Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binning a numpy array

I have a numpy array which contains time series data. I want to bin that array into equal partitions of a given length (it is fine to drop the last partition if it is not the same size) and then calculate the mean of each of those bins.

I suspect there is numpy, scipy, or pandas functionality to do this.

example:

data = [4,2,5,6,7,5,4,3,5,7]

for a bin size of 2:

bin_data = [(4,2),(5,6),(7,5),(4,3),(5,7)]
bin_data_mean = [3,5.5,6,3.5,6]

for a bin size of 3:

bin_data = [(4,2,5),(6,7,5),(4,3,5)]
bin_data_mean = [7.67,6,4]
like image 610
deltap Avatar asked Feb 20 '14 22:02

deltap


People also ask

What is a bin in NumPy?

binsint or sequence of scalars or str, optional. If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths.

How do you binning in Python?

Smoothing by bin means : In smoothing by bin means, each value in a bin is replaced by the mean value of the bin. Smoothing by bin median : In this method each bin value is replaced by its bin median value.


2 Answers

Just use reshape and then mean(axis=1).

As the simplest possible example:

import numpy as np

data = np.array([4,2,5,6,7,5,4,3,5,7])

print data.reshape(-1, 2).mean(axis=1)

More generally, we'd need to do something like this to drop the last bin when it's not an even multiple:

import numpy as np

width=3
data = np.array([4,2,5,6,7,5,4,3,5,7])

result = data[:(data.size // width) * width].reshape(-1, width).mean(axis=1)

print result
like image 152
Joe Kington Avatar answered Sep 18 '22 15:09

Joe Kington


Since you already have a numpy array, to avoid for loops, you can use reshape and consider the new dimension to be the bin:

In [33]: data.reshape(2, -1)
Out[33]: 
array([[4, 2, 5, 6, 7],
       [5, 4, 3, 5, 7]])

In [34]: data.reshape(2, -1).mean(0)
Out[34]: array([ 4.5,  3. ,  4. ,  5.5,  7. ])

Actually this will just work if the size of data is divisible by n. I'll edit a fix.

Looks like Joe Kington has an answer that handles that.

like image 22
TomAugspurger Avatar answered Sep 20 '22 15:09

TomAugspurger