Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

trying to display original and fitted data (nls + dnorm) with ggplot2's geom_smooth()

I am exploring some data, so the first thing I wanted to do was try to fit a normal (Gaussian) distribution to it. This is my first time trying this in R, so I'm taking it one step at a time. First I pre-binned my data:

myhist = data.frame(size = 10:27, counts = c(1L, 3L, 5L, 6L, 9L, 14L, 13L, 23L, 31L, 40L, 42L, 22L, 14L, 7L, 4L, 2L, 2L, 1L) )

qplot(x=size, y=counts, data=myhist)

plot1

Since I want counts, I need to add a normalization factor (N) to scale up the density:

fit = nls(counts ~ N * dnorm(size, m, s), data=myhist, start=c(m=20, s=5, N=sum(myhist$counts)) )   

Then I create the fitted data for display and everything works great:

x = seq(10,30,0.2)
fitted = data.frame(size = x, counts=predict(fit, data.frame(size=x)) )
ggplot(data=myhist, aes(x=size, y=counts)) + geom_point() + geom_line(data=fitted)

plot2

I got excited when I found this thread which talks about using geom_smooth() to do it all in one step, but I can't get it to work:

Here's what I try... and what I get:

ggplot(data=myhist, aes(x=size, y=counts)) + geom_point() + geom_smooth(method="nls", formula = counts ~ N * dnorm(size, m, s), se=F, start=list(m=20, s=5, N=300, size=10))

Error in method(formula, data = data, weights = weight, ...) : 
  parameters without starting value in 'data': counts

The error seems to indicate that it's trying to fit for the observed variable, counts, but that doesn't make any sense, and it predictably freaks out if I specify a "starting" value for counts too:

fitting parameters ‘m’, ‘s’, ‘N’, ‘size’, ‘counts’ without any variables

Error in eval(expr, envir, enclos) : object 'counts' not found

Any idea what I'm doing wrong? It's not the end of the world, of course, but fewer steps are always better, and you guys always come up with the most elegant solutions to these common tasks.

Thanks in advance!

Jeffrey

like image 721
Jeffrey Breen Avatar asked Dec 07 '10 22:12

Jeffrey Breen


1 Answers

the first error indicates that ggplot2 cannot find the variable 'count', which is used in formula, in data.

Stats take place after mapping, that is, size -> x, and counts -> y.

Here is an example for using nls in geom_smooth:

ggplot(data=myhist, aes(x=size, y=counts)) + geom_point() + 
  geom_smooth(method="nls", formula = y ~ N * dnorm(x, m, s), se=F, 
              start=list(m=20, s=5, N=300)) 

The point is that using x and y, instead of size and counts, in the specification of formula.

like image 121
kohske Avatar answered Sep 18 '22 17:09

kohske