I have a numpy array which contains time series data. I want to bin that array into equal partitions of a given length (it is fine to drop the last partition if it is not the same size) and then calculate the mean of each of those bins.
I suspect there is numpy, scipy, or pandas functionality to do this.
example:
data = [4,2,5,6,7,5,4,3,5,7]
for a bin size of 2:
bin_data = [(4,2),(5,6),(7,5),(4,3),(5,7)]
bin_data_mean = [3,5.5,6,3.5,6]
for a bin size of 3:
bin_data = [(4,2,5),(6,7,5),(4,3,5)]
bin_data_mean = [7.67,6,4]
binsint or sequence of scalars or str, optional. If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths.
Smoothing by bin means : In smoothing by bin means, each value in a bin is replaced by the mean value of the bin. Smoothing by bin median : In this method each bin value is replaced by its bin median value.
Just use reshape
and then mean(axis=1)
.
As the simplest possible example:
import numpy as np
data = np.array([4,2,5,6,7,5,4,3,5,7])
print data.reshape(-1, 2).mean(axis=1)
More generally, we'd need to do something like this to drop the last bin when it's not an even multiple:
import numpy as np
width=3
data = np.array([4,2,5,6,7,5,4,3,5,7])
result = data[:(data.size // width) * width].reshape(-1, width).mean(axis=1)
print result
Since you already have a numpy array, to avoid for loops, you can use reshape
and consider the new dimension to be the bin:
In [33]: data.reshape(2, -1)
Out[33]:
array([[4, 2, 5, 6, 7],
[5, 4, 3, 5, 7]])
In [34]: data.reshape(2, -1).mean(0)
Out[34]: array([ 4.5, 3. , 4. , 5.5, 7. ])
Actually this will just work if the size of data
is divisible by n
. I'll edit a fix.
Looks like Joe Kington has an answer that handles that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With