Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Order of factor levels changes when plotting layers with data subsets

Tags:

plot

r

ggplot2

I am trying to control the order of items in a legend in a ggplot2 plot in R. I looked up some other similar questions and found out about changing the order of the levels of the factor variable I am plotting. I am plotting data for 4 months, December, January, July, and June.

If I just do one plot command for all the months, it works as expected with the months ordered in the legend appearing in the order of the levels of the factor. However, I need to have a different dodge value for the summer (June & July) and winter (Dec & Jan) data. I do this with two geom_pointrange commands. When I divide it into 2 steps, the order of the legend goes back to alphabetical. You can demonstrate by commenting out the "plot summer" or "plot winter" command.

What can I change to keep my factor level order in the legend?

Please ignore the odd looking test data - the real data looks fine in this plot format.

#testdata
hour <- rep(seq(from=1,to=24,by=1),4)
avg_hou <- sample(seq(0,0.5,0.001),96,replace=TRUE)
lower_ci <- avg_hou - sample(seq(0,0.05,0.001),96,replace=TRUE)
upper_ci <- avg_hou + sample(seq(0,0.05,0.001),96,replace=TRUE)
Month <- c(rep("December",24), rep("January",24), rep("June",24), rep("July",24))

testdata <- data.frame(Month,hour,avg_hou,lower_ci,upper_ci)
testdata$Month <- factor(alldata$Month,levels=c("June", "July", "December","January"))

#basic plot setup
plotx <- ggplot(testdata, aes(x = hour, y = avg_hou, ymin = lower_ci, ymax = upper_ci, color = Month, shape = Month))
plotx <- plotx + scale_color_manual(values = c("June" = "#FDB863", "July" = "#E66101",  "December" = "#92C5DE", "January" = "#0571B0"))

#plot summer
plotx  <- plotx + geom_pointrange(data = testdata[testdata$Month == "June" | testdata$Month == "July",], size = 1, position=position_dodge(width=0.3)) 
#plot winter
plotx  <- plotx + geom_pointrange(data = testdata[testdata$Month == "December" | testdata$Month == "January",], size = 1, position=position_dodge(width=0.6))

print(plotx)
like image 726
Scott Avatar asked Dec 04 '13 22:12

Scott


2 Answers

Another way to think about "dodge" is as an offset from the x-values based on group (in this case Month). So if we add a dodge (x-offset) column to your original data, based on month:

# your original sample data
# note the use of set.seed(...) so "random" data is reproducible
set.seed(1)
hour     <- rep(seq(from=1,to=24,by=1),4)
avg_hou  <- sample(seq(0,0.5,0.001),96,replace=TRUE)
lower_ci <- avg_hou - sample(seq(0,0.05,0.001),96,replace=TRUE)
upper_ci <- avg_hou + sample(seq(0,0.05,0.001),96,replace=TRUE)
Month    <- c(rep("December",24), rep("January",24), rep("June",24), rep("July",24))
testdata       <- data.frame(Month,hour,avg_hou,lower_ci,upper_ci)
testdata$Month <- factor(testdata$Month,levels=c("June", "July", "December","January"))

# add offset column for dodge
testdata$dodge <- -2.5+(as.integer(testdata$Month))

# create ggplot object and default mappings
ggp <- ggplot(testdata, aes(x=hour, y = avg_hou, ymin = lower_ci, ymax = upper_ci, color = Month, shape = Month))
ggp <- ggp + scale_color_manual(values = c("June" = "#FDB863", "July" = "#E66101", "December" = "#92C5DE", "January" = "#0571B0"))

# plot the point range
ggp + geom_pointrange(aes(x=hour+0.2*dodge), size=1)

Produces this:

This does not require geom_blank(...) to maintain the scale order, and it does not require two calls to geom_pointrange(...)

like image 128
jlhoward Avatar answered Oct 15 '22 19:10

jlhoward


One possibility is to add a geom_blank as a first layer in the plot. From ?geom_blank: "The blank geom draws nothing, but can be a useful way of ensuring common scales between different plots.". We tell the geom_blank layer to use the entire data set. This layer thus sets up a scale which includes all levels of 'Month', correctly ordered. Then add the two layers of geom_pointrange, which each uses a subset of the data.

Perhaps a matter of taste in this particular case, but I tend to prefer to prepare the data sets before I use them in ggplot.

df_sum <- testdata[testdata$Month %in% c("June", "July"), ]
df_win <- testdata[testdata$Month %in% c("December", "January"), ]

ggplot(data = testdata, aes(x = hour, y = avg_hou, ymin = lower_ci, ymax = upper_ci,
       color = Month, shape = Month)) +
  geom_blank() +
  geom_pointrange(data = df_sum, size = 1, position = position_dodge(width = 0.3)) +
  geom_pointrange(data = df_win, size = 1, position = position_dodge(width = 0.6)) +
  scale_color_manual(values = c("June" = "#FDB863", "July" = "#E66101",
                     "December" = "#92C5DE", "January" = "#0571B0"))

enter image description here

like image 43
Henrik Avatar answered Oct 15 '22 19:10

Henrik