Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gaussian sum filter for irregular spaced points

I have a set of points (x,y) as two vectors x,y for example:

from pylab import *
x = sorted(random(30))
y = random(30)
plot(x,y, 'o-')

enter image description here

Now I would like to smooth this data with a Gaussian and evaluate it only at certain (regularly spaced) points on the x-axis. lets say for:

x_eval = linspace(0,1,11)

I got the tip that this method is called a "Gaussian sum filter", but so far I have not found any implementation in numpy/scipy for that, although it seems like a standard problem at first glance. As the x values are not equally spaced I can't use the scipy.ndimage.gaussian_filter1d.

Usually this kind of smoothing is done going through furrier space and multiplying with the kernel, but I don't really know if this will be possible with irregular spaced data.

Thanks for any ideas

like image 763
Magellan88 Avatar asked Jun 10 '14 14:06

Magellan88


People also ask

What is a Gaussian sum filter?

The Gaussian sum filter (GSF) is based on a fundamental result of Alspach and Sorenson [4], which states that any PDF can be approximated arbitrarily close (in the L1 sense) by a weighted sum (mixture) of Gaussian PDFs henceforth called a Gaussian sum.

What is a 2D Gaussian filter?

Brief Description. The Gaussian smoothing operator is a 2-D convolution operator that is used to `blur' images and remove detail and noise. In this sense it is similar to the mean filter, but it uses a different kernel that represents the shape of a Gaussian (`bell-shaped') hump.

Why is the Gaussian filter a good smoothing filter?

The Gaussian filter is pleasing because Fourier analysis shows that the Gaussian spatial distribution gives a Gaussian frequency distribution, leading to a good (often described as ``optimal'') tradeoff between localization in the spatial and in the frequency domains.

Is Gaussian filter and Gaussian blur same?

Gaussian filter is windowed filter of linear class, by its nature is weighted mean. Named after famous scientist Carl Gauss because weights in the filter calculated according to Gaussian distribution — the function Carl used in his works. Another name for this filter is Gaussian blur.


3 Answers

This will blow up for very large datasets, but the proper calculaiton you are asking for would be done as follows:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0) # for repeatability
x = np.random.rand(30)
x.sort()
y = np.random.rand(30)

x_eval = np.linspace(0, 1, 11)
sigma = 0.1

delta_x = x_eval[:, None] - x
weights = np.exp(-delta_x*delta_x / (2*sigma*sigma)) / (np.sqrt(2*np.pi) * sigma)
weights /= np.sum(weights, axis=1, keepdims=True)
y_eval = np.dot(weights, y)

plt.plot(x, y, 'bo-')
plt.plot(x_eval, y_eval, 'ro-')
plt.show()

enter image description here

like image 72
Jaime Avatar answered Oct 17 '22 21:10

Jaime


I'll preface this answer by saying that this is more of a DSP question than a programming question...

...that being said there, there is a simple two step solution to your problem.

Step 1: Resample the data

So to illustrate this we can create a random data set with unequal sampling:

import numpy as np
x = np.cumsum(np.random.randint(0,100,100))
y = np.random.normal(0,1,size=100)

This gives something like:

Unevenly sampled data

We can resample this data using simple linear interpolation:

nx = np.arange(x.max()) # choose new x axis sampling
ny = np.interp(nx,x,y) # generate y values for each x

This converts our data to:

Same data resampled evenly

Step 2: Apply filter

At this stage you can use some of the tools available through scipy to apply a Gaussian filter to the data with a given sigma value:

import scipy.ndimage.filters as filters
fx = filters.gaussian_filter1d(ny,sigma=100)

Plotting this up against the original data we get:

Gaussian filter applied

The choice of the sigma value determines the width of the filter.

like image 27
ebarr Avatar answered Oct 17 '22 21:10

ebarr


Based on @Jaime's answer I wrote a function that implements this with some additional documentation and the ability to discard estimates far from the datapoints.

I think confidence intervals could be obtained on this estimate by bootstrapping, but I haven't done this yet.

def gaussian_sum_smooth(xdata, ydata, xeval, sigma, null_thresh=0.6):
    """Apply gaussian sum filter to data.

    xdata, ydata : array
        Arrays of x- and y-coordinates of data. 
        Must be 1d and have the same length.

    xeval : array
        Array of x-coordinates at which to evaluate the smoothed result

    sigma : float
        Standard deviation of the Gaussian to apply to each data point
        Larger values yield a smoother curve.

    null_thresh : float
        For evaluation points far from data points, the estimate will be
        based on very little data. If the total weight is below this threshold,
        return np.nan at this location. Zero means always return an estimate.
        The default of 0.6 corresponds to approximately one sigma away 
        from the nearest datapoint.
    """
    # Distance between every combination of xdata and xeval
    # each row corresponds to a value in xeval
    # each col corresponds to a value in xdata
    delta_x = xeval[:, None] - xdata

    # Calculate weight of every value in delta_x using Gaussian
    # Maximum weight is 1.0 where delta_x is 0
    weights = np.exp(-0.5 * ((delta_x / sigma) ** 2))

    # Multiply each weight by every data point, and sum over data points
    smoothed = np.dot(weights, ydata)

    # Nullify the result when the total weight is below threshold
    # This happens at evaluation points far from any data
    # 1-sigma away from a data point has a weight of ~0.6
    nan_mask = weights.sum(1) < .6
    smoothed[nan_mask] = np.nan

    # Normalize by dividing by the total weight at each evaluation point
    # Nullification above avoids divide by zero warning shere
    smoothed = smoothed / weights.sum(1)


    return smoothed
like image 1
cxrodgers Avatar answered Oct 17 '22 23:10

cxrodgers