Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to choose nls() over loess()?

If I have some (x,y) data, I can easily draw straight-line through it, e.g.

f=glm(y~x)
plot(x,y)
lines(x,f$fitted.values)

But for curvy data I want a curvy line. It seems loess() can be used:

f=loess(y~x)
plot(x,y)
lines(x,f$fitted)

This question has evolved as I've typed and researched it. I started off with wanting to a simple function to fit curvy data (where I know nothing about the data), and wanting to understand how to use nls() or optim() to do that. That was what everyone seemed to be suggesting in similar questions I found. But now I stumbled upon loess() I'm happy. So, now my question is why would someone choose to use nls or optim instead of loess (or smooth.spline)? Using the toolbox analogy, is nls a screwdriver and loess is a power-screwdriver (meaning I'd almost always choose the latter as it does the same thing but with less of my effort)? Or is nls a flat-head screwdriver and loess a cross-head screwdriver (meaning loess is a better fit for some problems, but for others it simply won't do the job)?

For reference, here is the play data I was using that loess gives satisfactory results for:

x=1:40
y=(sin(x/5)*3)+runif(x)

And:

x=1:40
y=exp(jitter(x,factor=30)^0.5)

Sadly, it does less well on this:

x=1:400
y=(sin(x/20)*3)+runif(x)

Can nls(), or any other function or library, cope with both this and the previous exp example, without being given a hint (i.e. without being told it is a sine wave)?

UPDATE: Some useful pages on the same theme on stackoverflow:

Goodness of fit functions in R

How to fit a smooth curve to my data in R?

smooth.spline "out of the box" gives good results on my 1st and 3rd examples, but terrible (it just joins the dots) on the 2nd example. However f=smooth.spline(x,y,spar=0.5) is good on all three.

UPDATE #2: gam() (from mgcv package) is great so far: it gives a similar result to loess() when that was better, and a similar result to smooth.spline() when that was better. And all without hints or extra parameters. The docs were so far over my head I felt like I was squinting at a plane flying overhead; but a bit of trial and error found:

#f=gam(y~x)    #Works just like glm(). I.e. pointless
f=gam(y~s(x)) #This is what you want
plot(x,y)
lines(x,f$fitted)
like image 773
Darren Cook Avatar asked Sep 26 '11 04:09

Darren Cook


2 Answers

Nonlinear-least squares is a means of fitting a model that is non-linear in the parameters. By fitting a model, I mean there is some a priori specified form for the relationship between the response and the covariates, with some unknown parameters that are to be estimated. As the model is non-linear in these parameters NLS is a means to estimate values for those coefficients by minimising a least-squares criterion in an iterative fashion.

LOESS was developed as a means of smoothing scatterplots. It has a very less well defined concept of a "model" that is fitted (IIRC there is no "model"). LOESS works by trying to identify pattern in the relationship between response and covariates without the user having to specify what form that relationship is. LOESS works out the relationship from the data themselves.

These are two fundamentally different ideas. If you know the data should follow a particular model then you should fit that model using NLS. You could always compare the two fits (NLS vs LOESS) to see if there is systematic variation from the presumed model etc - but that would show up in the NLS residuals.

Instead of LOESS, you might consider Generalized Additive Models (GAMs) fitted via gam() in recommended package mgcv. These models can be viewed as a penalised regression problem but allow for the fitted smooth functions to be estimated from the data like they are in LOESS. GAM extends GLM to allow smooth, arbitrary functions of covariates.

like image 115
Gavin Simpson Avatar answered Nov 16 '22 02:11

Gavin Simpson


loess() is non-parametric, meaning you don't get a set of coefficients you can use later - it's not a model, just a fit line. nls() will give you coefficients you could use to build an equation and predict values with a different but similar data set - you can create a model with nls().

like image 37
Josh Avatar answered Nov 16 '22 02:11

Josh