I am trying to understand the different behaviors of these two smoothing functions when given apparently equivalent inputs. My understanding was that locpoly
just takes a fixed bandwidth argument, while locfit
can also include a varying part in its smoothing parameter (a nearest-neighbors fraction, "nn
"). I thought setting this varying part to zero in locfit
should make the "h
" component act like the fixed bandwidth used in locpoly
, but this is evidently not the case.
A working example:
library(KernSmooth) library(locfit) set.seed(314) n <- 100 x <- runif(n, 0, 1) eps <- rnorm(n, 0, 1) y <- sin(2 * pi * x) + eps plot(x, y) lines(locpoly(x, y, bandwidth=0.05, degree=1), col=3) lines(locfit(y ~ lp(x, nn=0, h=0.05, deg=1)), col=4)
Produces this plot:
locpoly
gives the smooth green line, and locfit
gives the wiggly blue line. Clearly, locfit
has a smaller "effective" bandwidth here, even though the supposed bandwidth parameter has the same value for each.
What are these functions doing differently?
locfit: Local Regression, Likelihood and Density Estimation. Page 1.
Local Linear Regression (LLR) is a nonparametric regression model applied in the modeling phase of Response Surface Methodology (RSM). LLR does not make reference to any fixed parametric model. Hence, LLR is flexible and can capture local trends in the data that might be too complicated for the OLS.
The two parameters both represent smoothing, but they do so in two different ways.
locpoly's bandwidth parameter is relative to the scale of the x-axis here. For example, if you changed the line x <- runif(n, 0, 1)
to x <- runif(n, 0, 10)
, you will see that the green locpoly line becomes much more squiggly despite the fact that you still have the same number of points (100).
locfit's smoothing parameter, h, is independent of the scale, and instead is based on a proportion of the data. The value 0.05 means 5% of the data that is closest to that position is used to fit the curve. So changing the scale would not alter the line.
This also explains the observation made in the comment that changing the value of h to 0.1 makes the two look nearly identical. This makes sense, because we can expect that a bandwidth of 0.05 will contain about 10% of the data if we have 100 points distributed uniformly from 0 to 1.
My sources include the documentation for the locfit package and the documentation for the locpoly function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With