Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In ggplot restrict y to be >0 in LOESS

Tags:

plot

r

ggplot2

axis

Here's my code:

#data
sites <- 
  structure(list(site = c(928L, 928L, 928L, 928L, 928L, 928L, 928L,
                          928L, 928L, 928L, 928L, 928L, 928L, 928L,
                          928L, 928L, 928L, 928L, 928L, 928L, 928L,
                          928L, 928L, 928L, 928L, 928L), 
                 date = c(13493L, 13534L, 13566L, 13611L, 13723L,
                          13752L, 13804L, 13837L, 13927L, 14028L,
                          14082L, 14122L, 14150L, 14182L, 14199L,
                          16198L, 16279L, 16607L, 16945L, 17545L,
                          17650L, 17743L, 17868L, 17941L, 18017L, 18092L),
                 y = c(7L, 7L, 17L, 18L, 17L, 17L, 10L, 3L, 17L, 24L, 
                       11L, 5L, 5L, 3L, 5L, 14L, 2L, 9L, 9L, 4L, 7L,
                       6L, 1L, 0L, 5L, 0L)), 
            .Names = c("site", "date", "y"),
            class = "data.frame", row.names = c(NA, -26L))

#convert to date
x<-as.Date(sites$date, origin="1960-01-01") 

#plot smooth, line goes below zero!
qplot(data=sites, x, y, main="Site 349") 
(p <- qplot(data = sites, x, y, xlab = "", ylab = ""))
(p1 <- p + geom_smooth(method = "loess",span=0.5, size = 1.5))

Some of the LOESS lines and confidence intervals go below zero, I would like to restrict the graphics to 0 and positive numbers (because negative do not make sense). enter image description here

How can I do this?

like image 600
Nate Avatar asked May 05 '10 21:05

Nate


2 Answers

I am seconding Matt Parker's suggestion that you have to change the fitting procedure. One option that often works for positive-only data, is to do the fit on log-scale, and then exponentiate to get results on the original scale. This will guarantee positive only values.

Generating random data that has some of this issues:

 d <- data.frame(x=0:100)
 d$y <- exp(rnorm(nrow(d), mean=-d$x/40, sd=0.8))
 qplot(x,y,data=d) + stat_smooth() 

Now we can use ggplot's transformation capabilites to log-transform the y-values, but display the results on an exponential scale (which corresponds to the original one):

qplot(x,y,data=d) + stat_smooth() + scale_y_log10()+coord_trans(ytrans="pow10")

You can see examples like this on the coord_trans help page. If you don't like the y-axis, you can manipulate the breaks and labels.

EDIT based on the question update

There have been some changes in ggplot2 since the question was originally asked, and the original answer did not deal with 0's.

Option 1

The main idea of the solution is the same: find a transformation that will map the range of possible values to -Inf to Inf, do the loess smooth there, and then backtransform the result. The log-transformation would be great if there were no zeroes. I don't think the required function exists if 0 is included, but a possibility that often works is the log(1+x) transformation. That is built-in, but we need to have the inverse transformation exp(x)-1 as well.

library(scales)
#create exp(x)-1 transformation, the inverse of log(1+p)
expm1_trans <-  function() trans_new("expm1", "expm1", "log1p")

qplot(x, y, data=sites) + stat_smooth(method="loess") +
  scale_y_continuous(trans=log1p_trans()) +
  coord_trans(ytrans=expm1_trans())

Loess fit on log(1+x) transformed data

Option 2

The second option extends the suggestion in the comments to Matt Parker's answer: use a regression method that incorporates the integer nature of the outcomes. That means overdispersed (just in case) Poisson regression for counts. While you can't do loess, you can do a spline fit. You can play with the degrees of freedom to control the smoothness.

library(splines)
qplot(x, y, data=sites) + stat_smooth(method="glm", family="quasipoisson", 
                                      formula = y ~ ns(x, 3))

Spline fit using overdispersed Poisson regression

The two options give quite similar results, which is a good thing.

like image 180
Aniko Avatar answered Nov 13 '22 18:11

Aniko


I can't test it without some example data, but

qplot(data=sites, x, y, main="Site 349")  
(p <- qplot(data = sites, x, y, xlab = "", ylab = "")) 
(p1 <- p + geom_smooth(method = "loess",span=0.5, size = 1.5)) 
p1 + theme_bw() + opts(title = "Site 349") + ylim(0, foo)

(where foo is a suitable upper limit for your plot) might do the trick. Unlike in base graphics, the xlim() and ylim() commands in ggplot actually restrict the data that are used in making the plot, rather than just the plot window. It might also restrict the geom_smooth() (though I'm not certain).

Edit: After reading a bit more, you might also want to consider switching out the model that geom_smooth is using. Again, not being able to see your data is a problem. But, for example, if it's binary - you can add stat_smooth(method="glm", family="binomial") to get a logit-smoothed line. See ?stat_smooth for more.

like image 34
Matt Parker Avatar answered Nov 13 '22 18:11

Matt Parker