Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Local linear regression in R -- locfit() vs locpoly()

I am trying to understand the different behaviors of these two smoothing functions when given apparently equivalent inputs. My understanding was that locpoly just takes a fixed bandwidth argument, while locfit can also include a varying part in its smoothing parameter (a nearest-neighbors fraction, "nn"). I thought setting this varying part to zero in locfit should make the "h" component act like the fixed bandwidth used in locpoly, but this is evidently not the case.

A working example:

library(KernSmooth) library(locfit) set.seed(314)  n <- 100 x <- runif(n, 0, 1) eps <- rnorm(n, 0, 1) y <- sin(2 * pi * x) + eps  plot(x, y) lines(locpoly(x, y, bandwidth=0.05, degree=1), col=3) lines(locfit(y ~ lp(x, nn=0, h=0.05, deg=1)), col=4) 

Produces this plot:

plot of smoothers

locpoly gives the smooth green line, and locfit gives the wiggly blue line. Clearly, locfit has a smaller "effective" bandwidth here, even though the supposed bandwidth parameter has the same value for each.

What are these functions doing differently?

like image 920
user1870614 Avatar asked Feb 02 '15 16:02

user1870614


People also ask

What is Locfit?

locfit: Local Regression, Likelihood and Density Estimation. Page 1.

What is a local Linear Regression?

Local Linear Regression (LLR) is a nonparametric regression model applied in the modeling phase of Response Surface Methodology (RSM). LLR does not make reference to any fixed parametric model. Hence, LLR is flexible and can capture local trends in the data that might be too complicated for the OLS.


1 Answers

The two parameters both represent smoothing, but they do so in two different ways.

locpoly's bandwidth parameter is relative to the scale of the x-axis here. For example, if you changed the line x <- runif(n, 0, 1) to x <- runif(n, 0, 10), you will see that the green locpoly line becomes much more squiggly despite the fact that you still have the same number of points (100).

locfit's smoothing parameter, h, is independent of the scale, and instead is based on a proportion of the data. The value 0.05 means 5% of the data that is closest to that position is used to fit the curve. So changing the scale would not alter the line.

This also explains the observation made in the comment that changing the value of h to 0.1 makes the two look nearly identical. This makes sense, because we can expect that a bandwidth of 0.05 will contain about 10% of the data if we have 100 points distributed uniformly from 0 to 1.

My sources include the documentation for the locfit package and the documentation for the locpoly function.

like image 114
znr Avatar answered Sep 27 '22 22:09

znr