Understanding loess errors in R

Tags:

loess

I'm trying to fit a model using loess, and I'm getting errors such as "pseudoinverse used at 3", "neighborhood radius 1", and "reciprocal condition number 0". Here's a MWE:

x = 1:19
y = c(NA,71.5,53.1,53.9,55.9,54.9,60.5,NA,NA,NA
      ,NA,NA,178.0,180.9,180.9,NA,NA,192.5,194.7)
fit = loess(formula = y ~ x,
        control = loess.control(surface = "direct"),
        span = 0.3, degree = 1)
x2 = seq(0,20,.1)
library(ggplot2)
qplot(x=x2
    ,y=predict(fit, newdata=data.frame(x=x2))
    ,geom="line")

I realize I can fix these errors by choosing a larger span value. However, I'm trying to automate this fit, as I have about 100,000 time series (each of length about 20) similar to this. Is there a way that I can automatically choose a span value that will prevent these errors while still providing a fairly flexible fit to the data? Or, can anyone explain what these errors mean? I did a bit of poking around in the loess() and simpleLoess() functions, but I gave up at the point when C code was called.

464

asked Dec 17 '14 15:12

random_forest_fanatic

1 Answers

Compare fit$fitted to y. You'll notice that something is wrong with your regression. Choose adequate bandwidth, otherwise it'll just interpolate the data. With too few data points, linear function behaves like constant on small bandwidth and triggers collinearity. Thus, you see the errors warning pseudoinverses, singularities. You wont see such errors if you use degree=0 or ksmooth. One intelligible, data-driven choice of span is to use to cross-validation, about which you can ask at Cross Validated.

> fit$fitted
 [1]  71.5  53.1  53.9  55.9  54.9  60.5 178.0 180.9 180.9 192.5 194.7
> y
 [1]    NA  71.5  53.1  53.9  55.9  54.9  60.5    NA    NA    NA    NA    NA 178.0
[14] 180.9 180.9    NA    NA 192.5 194.7

You see over-fit( perfect-fit) because in your model number of parameters are as many as effective sample size.

fit
#Call:
#loess(formula = y ~ x, span = 0.3, degree = 1, control = loess.control(surface = "direct"))

#Number of Observations: 11 
#Equivalent Number of Parameters: 11 
#Residual Standard Error: Inf

Or, you might as well just use automated geom_smooth. (again setting geom_smooth(span=0.3) throws warnings)

ggplot(data=data.frame(x, y), aes(x, y)) + 
  geom_point() + geom_smooth()

enter image description here

answered Oct 17 '22 10:10

Khashaa

Related questions
                            
                                Create multiple plots with unique RMarkdown headers
                            
                                Using row-wise column indices in a vector to extract values from data frame [duplicate]
                            
                                How can I make ShinyApp to use environmental variables when deployed on the web?
                            
                                Forecasting error in R when passing around arguments in forecast() and ar()
                            
                                tables in pander, style="multiline"
                            
                                Shiny select go to different tabPanel using action button or something
                            
                                Proper way to return from ggvis when the data is empty?
                            
                                do.call specify environment inside function
                            
                                Using Cost Sensitive C50 in caret
                            
                                Easily finding and replacing every match in a nested list
                            
                                Excel SUMIFS equivalent in R
                            
                                howto: Automatically set fixed coordinate ratio (coord_fixed) when x- and y-axis are on different scales?
                            
                                Split one row after every 3rd column and transport those 3 columns as a new row in r
                            
                                R, Confusion Matrix in percent
                            
                                extracting data using XPathSApply conditioning on more than one attribute
                            
                                Rserve server: how to terminate a blocking instance (eval taking forever)?
                            
                                R sorting data subset
                            
                                Change column value by row string value in R
                            
                                Creating a list of raster bricks from a multivariate netCDF file
                            
                                Positioning Shiny widgets beside their headers

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With