Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gaussian kernel density smoothing for pandas.DataFrame.resample?

I am using pandas.DataFrame.resample to resample random events to 1 hour intervals and am seeing very stochastic results that don't seem to go away if I increase the interval to 2 or 4 hours. It makes me wonder whether Pandas has any type of method for generating a smoothed density kernel like a Gaussian kernel density method with an adjustable bandwidth to control smoothing. I'm not seeing anything in the documentation, but thought I would post here before posting on the developer list server since that is their preference. Scikit-Learn has precisely the Gaussian kernel density function that I want, so I will try to make use of it, but it would be a fantastic addition to Pandas.

Any help is greatly appreciated!

hourly[0][344:468].plot()

enter image description here

like image 513
user3654387 Avatar asked Nov 24 '14 07:11

user3654387


People also ask

What is a Gaussian smoothing kernel?

The Gaussian kernel The 'kernel' for smoothing, defines the shape of the function that is used to take the average of the neighboring points. A Gaussian kernel is a kernel with the shape of a Gaussian (normal distribution) curve.

What does KDE do in Python?

Generate Kernel Density Estimate plot using Gaussian kernels. In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function (PDF) of a random variable. This function uses Gaussian kernels and includes automatic bandwidth determination.

How does a smoothing kernel work?

Kernel smoothing is the most popular nonparametric approach to constructing an estimated PMF or PDF. It generalizes the idea of a moving average. In technical terms, a kernel smoother redistributes mass around an observation according to two inputs: a kernel function and a bandwidth.

What is density in pandas?

Density Plot is a type of data visualization tool. It is a variation of the histogram that uses 'kernel smoothing' while plotting the values. It is a continuous and smooth version of a histogram inferred from a data.


2 Answers

Pandas has the ability to apply an aggregation over a rolling window. The win_type parameter controls the window's shape. The center parameter can be set in order for the labels to be set at the center of the window, instead of the right edge. To do Gaussian smoothing:

hrly = pd.Series(hourly[0][344:468])
smooth = hrly.rolling(window=5, win_type='gaussian', center=True).mean(std=0.5)

http://pandas.pydata.org/pandas-docs/stable/computation.html#rolling

like image 91
kb0 Avatar answered Sep 30 '22 14:09

kb0


I have now found that this is option is available in pandas.stats.moments.ewma and it works quite nicely. Here are the results:

from pandas.stats.moments import ewma

hourly[0][344:468].plot(style='b')
ewma(hourly[0][344:468], span=35).plot(style='k')

enter image description here

like image 45
user3654387 Avatar answered Sep 30 '22 14:09

user3654387