Smoothing by bin means : In smoothing by bin means, each value in a bin is replaced by the mean value of the bin. Smoothing by bin median : In this method each bin value is replaced by its bin median value.
1. SciPy builds on NumPy. All the numerical code resides in SciPy. The SciPy module consists of all the NumPy functions.
It's probably faster and easier to use numpy.digitize()
:
import numpy
data = numpy.random.random(100)
bins = numpy.linspace(0, 1, 10)
digitized = numpy.digitize(data, bins)
bin_means = [data[digitized == i].mean() for i in range(1, len(bins))]
An alternative to this is to use numpy.histogram()
:
bin_means = (numpy.histogram(data, bins, weights=data)[0] /
numpy.histogram(data, bins)[0])
Try for yourself which one is faster... :)
The Scipy (>=0.11) function scipy.stats.binned_statistic specifically addresses the above question.
For the same example as in the previous answers, the Scipy solution would be
import numpy as np
from scipy.stats import binned_statistic
data = np.random.rand(100)
bin_means = binned_statistic(data, data, bins=10, range=(0, 1))[0]
Not sure why this thread got necroed; but here is a 2014 approved answer, which should be far faster:
import numpy as np
data = np.random.rand(100)
bins = 10
slices = np.linspace(0, 100, bins+1, True).astype(np.int)
counts = np.diff(slices)
mean = np.add.reduceat(data, slices[:-1]) / counts
print mean
The numpy_indexed package (disclaimer: I am its author) contains functionality to efficiently perform operations of this type:
import numpy_indexed as npi
print(npi.group_by(np.digitize(data, bins)).mean(data))
This is essentially the same solution as the one I posted earlier; but now wrapped in a nice interface, with tests and all :)
I would add, and also to answer the question find mean bin values using histogram2d python that the scipy also have a function specially designed to compute a bidimensional binned statistic for one or more sets of data
import numpy as np
from scipy.stats import binned_statistic_2d
x = np.random.rand(100)
y = np.random.rand(100)
values = np.random.rand(100)
bin_means = binned_statistic_2d(x, y, values, bins=10).statistic
the function scipy.stats.binned_statistic_dd is a generalization of this funcion for higher dimensions datasets
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With