I am using pandas.DataFrame.resample
to resample random events to 1 hour intervals and am seeing very stochastic results that don't seem to go away if I increase the interval to 2 or 4 hours. It makes me wonder whether Pandas has any type of method for generating a smoothed density kernel like a Gaussian kernel density method with an adjustable bandwidth to control smoothing. I'm not seeing anything in the documentation, but thought I would post here before posting on the developer list server since that is their preference. Scikit-Learn has precisely the Gaussian kernel density function that I want, so I will try to make use of it, but it would be a fantastic addition to Pandas.
Any help is greatly appreciated!
hourly[0][344:468].plot()
The Gaussian kernel The 'kernel' for smoothing, defines the shape of the function that is used to take the average of the neighboring points. A Gaussian kernel is a kernel with the shape of a Gaussian (normal distribution) curve.
Generate Kernel Density Estimate plot using Gaussian kernels. In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function (PDF) of a random variable. This function uses Gaussian kernels and includes automatic bandwidth determination.
Kernel smoothing is the most popular nonparametric approach to constructing an estimated PMF or PDF. It generalizes the idea of a moving average. In technical terms, a kernel smoother redistributes mass around an observation according to two inputs: a kernel function and a bandwidth.
Density Plot is a type of data visualization tool. It is a variation of the histogram that uses 'kernel smoothing' while plotting the values. It is a continuous and smooth version of a histogram inferred from a data.
Pandas has the ability to apply an aggregation over a rolling window. The win_type
parameter controls the window's shape. The center
parameter can be set in order for the labels to be set at the center of the window, instead of the right edge. To do Gaussian smoothing:
hrly = pd.Series(hourly[0][344:468])
smooth = hrly.rolling(window=5, win_type='gaussian', center=True).mean(std=0.5)
http://pandas.pydata.org/pandas-docs/stable/computation.html#rolling
I have now found that this is option is available in pandas.stats.moments.ewma
and it works quite nicely. Here are the results:
from pandas.stats.moments import ewma
hourly[0][344:468].plot(style='b')
ewma(hourly[0][344:468], span=35).plot(style='k')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With