Having problems with nonlinear regression in ggplot

Question

attenuation = data.frame(km =      
c(0,0,0.4,0.4,0.8,0.8,1.2,1.2,1.6,1.6,2,2,2.4,2.4,2.8,2.8,3.2,3.2,3.6,3.6,4, 
4,4.4,4.4,4.8,4.8,5.2,5.2,5.6,5.6,6,6,6.4,6.4,6.8,6.8,7.2,7.2,7.6,7.6,8,8, 
11.7,11.7,13,13), edna = c(76000,20000,0,0,6000,0,0,6880,10700,0,6000,
0,0,0,0,0,0,6000,0,0,0,0,0,0,0,0,6310,0,6000,6000,0,0,0,0,0,
0,0,0,0,0,0,6000,0,0,0,0))

#This worked great for a linear regression
ggplot(attenuation, aes(x = km, y = edna)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
xlab("Distance from Cage (km)") +
ylab("eDNA concentration (gene sequence/Liter)")

But the linear regression doesn't seem to be a good fit (r squared =0.09). So I'd like to try something else. I tried some other regressions also with poor fits, so I'd like to try a nonlinear regression.

I have researched this question on stack overflow and tried a number of different options, but nothing is working. The option I provide below makes the most sense-but I wonder if I have the formula wrong? Or if the start list needs to be modified?

For context I am trying to explore the relationship between river distance and concentration.

#This is not working for a nonlinear regression
ggplot(attenuation, aes(x = km, y = edna))+ 
geom_point() + 
stat_smooth(method = 'nls', formula = 'y~a*x^b', method.args=list (start = 
list(a = 1,b=1), se=FALSE))

I get the following error from r when I run the code for nls above Computation failed in stat_smooth(): variable lengths differ (found for '(se)')

dcarlson · Accepted Answer

You have 2 problems. First a missplaced ")" since se=FALSE is an argument to stat_smooth=, not method.args=:

ggplot(attenuation, aes(x = km, y = edna))+ 
  geom_point() + 
  stat_smooth(method='nls', formula='y~a*x^b', method.args=list(start = 
     list(a=1, b=1)), se=FALSE)

But this will not work either because your model is impossible to fit to your data. Look at the equation. When x=0, y will equal 0. For values of x greater than 0, y will increase unless b is negative, but but then x=0 is Inf so the algorithm fails to try negative values. Since you have a decreasing relationship, you need to specify a function that is defined for x=0 and plausible starting values. This one parameter fits your data better than a linear function (it could also be defined as a*(x + 1)^-1 which is essentially your function with 1 added to x so that it is defined at x=0:

ggplot(attenuation, aes(x = km, y = edna))+ 
   geom_point() + 
   stat_smooth(method = 'nls', formula = 'y~a/(x + 1)', 
      method.args=list(start=list(a=50000)), se=FALSE)

[ One parameter[1]

I picked 50000 by splitting the difference between 20,000 and 76,000. The final estimate is about 20,000. You can bend the curve more sharply by adding a second parameter, but you have so many 0 values it may be too much depending on what you are trying to communicate:

ggplot(attenuation, aes(x = km, y = edna))+ 
   geom_point() + 
   stat_smooth(method='nls', formula='y~a*(1+x)^b', method.args=list(start = 
      list(a=50000, b=-1)), se=FALSE)

Two parameters

Ben Bolker · Answer

I agree with @dcarlson's answer. You've got a pretty small data set here (a total of 11 non-zero data points, two of which fall on top of each other) so you probably shouldn't push any conclusions too hard. The first two points are definitely large, and there might be a mild declining trend after that, but beyond that you can't say too much.

If you want to do the power-law fit you have to displace the zero-km data point from the origin. I've done it by adding 0.1 to the x values. This is an arbitrary choice on my part and should be thought about carefully on your end ... (note that there's a large difference in the results if you add 0.1 as I did or 1 as @dcarlson did). I also had to put in more reasonable starting values, which I did by fitting a log-log linear regression (lm(log(edna) ~ log(km+0.1), data=attenuation)) and extracting the coefficients (which were approximately 4 and -1.5).

ggplot(attenuation, aes(x = km, y = edna))+ 
  geom_point() + 
  stat_smooth(method = 'nls', formula = 'y~a*(x+0.1)^b',
              method.args=list (start = list(a = exp(4),b=-1.5)), se=FALSE)

You can also do this slightly more efficiently with a log-link Gaussian GLM as follows (you still need to displace the x-values from zero). I also added some code to disambiguate the repeated points.

ggplot(attenuation, aes(x = km, y = edna))+ 
   stat_sum() + 
   geom_smooth(method="glm", formula=y~log(x+0.1),
          method.args=list(family=gaussian(link="log"),
                           start=c(4,-1.5)))+
   scale_size(breaks=c(1,2),range=c(1,3))

Having problems with nonlinear regression in ggplot

Tags:

r

ggplot2

Ellen

2 Answers

dcarlson

Ben Bolker

Recent Activity

Donate For Us

Having problems with nonlinear regression in ggplot

Tags:

r

ggplot2

Ellen

2 Answers

dcarlson

Ben Bolker

Related questions

Recent Activity

Donate For Us