Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding loess errors in R

Tags:

r

loess

I'm trying to fit a model using loess, and I'm getting errors such as "pseudoinverse used at 3", "neighborhood radius 1", and "reciprocal condition number 0". Here's a MWE:

x = 1:19
y = c(NA,71.5,53.1,53.9,55.9,54.9,60.5,NA,NA,NA
      ,NA,NA,178.0,180.9,180.9,NA,NA,192.5,194.7)
fit = loess(formula = y ~ x,
        control = loess.control(surface = "direct"),
        span = 0.3, degree = 1)
x2 = seq(0,20,.1)
library(ggplot2)
qplot(x=x2
    ,y=predict(fit, newdata=data.frame(x=x2))
    ,geom="line")

I realize I can fix these errors by choosing a larger span value. However, I'm trying to automate this fit, as I have about 100,000 time series (each of length about 20) similar to this. Is there a way that I can automatically choose a span value that will prevent these errors while still providing a fairly flexible fit to the data? Or, can anyone explain what these errors mean? I did a bit of poking around in the loess() and simpleLoess() functions, but I gave up at the point when C code was called.

like image 464
random_forest_fanatic Avatar asked Dec 17 '14 15:12

random_forest_fanatic


People also ask

What does loess stand for in R?

The name 'loess' stands for Locally Weighted Least Squares Regression. So, it uses more local data to estimate our Y variable. But it is also known as a variable bandwidth smoother, in that it uses a 'nearest neighbors' method to smooth.

What does loess function do in R?

Loess Regression is the most common method used to smoothen a volatile time series. It is a non-parametric methods where least squares regression is performed in localized subsets, which makes it a suitable candidate for smoothing any numerical vector.

What is the difference between loess and lowess?

lowess is for adding a smooth curve to a scatterplot, i.e., for univariate smoothing. loess is for fitting a smooth surface to multivariate data. Both algorithms use locally-weighted polynomial regression, usually with robustifying iterations.


1 Answers

Compare fit$fitted to y. You'll notice that something is wrong with your regression. Choose adequate bandwidth, otherwise it'll just interpolate the data. With too few data points, linear function behaves like constant on small bandwidth and triggers collinearity. Thus, you see the errors warning pseudoinverses, singularities. You wont see such errors if you use degree=0 or ksmooth. One intelligible, data-driven choice of span is to use to cross-validation, about which you can ask at Cross Validated.

> fit$fitted
 [1]  71.5  53.1  53.9  55.9  54.9  60.5 178.0 180.9 180.9 192.5 194.7
> y
 [1]    NA  71.5  53.1  53.9  55.9  54.9  60.5    NA    NA    NA    NA    NA 178.0
[14] 180.9 180.9    NA    NA 192.5 194.7

You see over-fit( perfect-fit) because in your model number of parameters are as many as effective sample size.

fit
#Call:
#loess(formula = y ~ x, span = 0.3, degree = 1, control = loess.control(surface = "direct"))

#Number of Observations: 11 
#Equivalent Number of Parameters: 11 
#Residual Standard Error: Inf 

Or, you might as well just use automated geom_smooth. (again setting geom_smooth(span=0.3) throws warnings)

ggplot(data=data.frame(x, y), aes(x, y)) + 
  geom_point() + geom_smooth()

enter image description here

like image 93
Khashaa Avatar answered Oct 17 '22 10:10

Khashaa