Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Order of legend entries in ggplot2 barplots with coord_flip()

Tags:

r

ggplot2

I'm struggling get the right ordering of variables in a graph I made with ggplot2 in R.

Suppose I have a dataframe such as:

set.seed(1234)
my_df<- data.frame(matrix(0,8,4))
names(my_df) <- c("year", "variable", "value", "vartype")
my_df$year <- rep(2006:2007)
my_df$variable <- c(rep("VX",2),rep("VB",2),rep("VZ",2),rep("VD",2))
my_df$value <- runif(8, 5,10) 
my_df$vartype<- c(rep("TA",4), rep("TB",4))

which yields the following table:

  year variable    value vartype
1 2006       VX 5.568517      TA
2 2007       VX 8.111497      TA
3 2006       VB 8.046374      TA
4 2007       VB 8.116897      TA
5 2006       VZ 9.304577      TB
6 2007       VZ 8.201553      TB
7 2006       VD 5.047479      TB
8 2007       VD 6.162753      TB

There are four variables (VX, VB, VZ and VD), belonging to two groups of variable types, (TA and TB).

I would like to plot the values as horizontal bars on the y axis, ordered vertically first by variable groups and then by variable names, faceted by year, with values on the x axis and fill colour corresponding to variable group. (i.e. in this simplified example, the order should be, top to bottom, VB, VX, VD, VZ)

1) My first attempt has been to try the following:

ggplot(my_df,        
    aes(x=variable, y=value, fill=vartype, order=vartype)) +
       # adding or removing the aesthetic "order=vartype" doesn't change anything
     geom_bar() + 
     facet_grid(. ~ year) + 
     coord_flip()

However, the variables are listed in reverse alphabetical order, but not by vartype : the order=vartype aesthetic is ignored.

enter image description here

2) Following an answer to a similar question I posted yesterday, i tried the following, based on the post Order Bars in ggplot2 bar graph :

my_df$variable <- factor(
  my_df$variable, 
  levels=rev(sort(unique(my_df$variable))), 
  ordered=TRUE
)

This approach does gets the variables in vertical alphabetical order in the plot, but ignores the fact that the variables should be ordered first by variable goups (with TA-variables on top and TB-variables below).

enter image description here

3) The following gives the same as 2 (above):

my_df$vartype <- factor(
  my_df$vartype, 
  levels=sort(unique(my_df$vartype)), 
  ordered=TRUE
)

... which has the same issues as the first approach (variables listed in reverse alphabetical order, groups ignored)

4) another approach, based on the original answer to Order Bars in ggplot2 bar graph , also gives the same plat as 2, above

my_df <- within(my_df, 
                vartype <- factor(vartype, 
                levels=names(sort(table(vartype),
                decreasing=TRUE)))
                ) 

I'm puzzled by the fact that, despite several approaches, the aesthetic order=vartype is ignored. Still, it seems to work in an unrelated problem: http://learnr.wordpress.com/2010/03/23/ggplot2-changing-the-default-order-of-legend-labels-and-stacking-of-data/

I hope that the problem is clear and welcome any suggestions.

Matteo

I posted a similar question yesterday, but, unfortunately I made several mistakes when descrbing the problem and providing a reproducible example. I've listened to several suggestions since, and thoroughly searched stakoverflow for similar question and applied, to the best of my knowledge, every suggested combination of solutions, to no avail. I'm posting the question again hoping to be able to solve my issue and, hopefully, be helpful to others.

like image 264
MatteoS Avatar asked Sep 04 '11 13:09

MatteoS


1 Answers

This has little to do with ggplot, but is instead a question about generating an ordering of variables to use to reorder the levels of a factor. Here is your data, implemented using the various functions to better effect:

set.seed(1234)
df2 <- data.frame(year = rep(2006:2007), 
                  variable = rep(c("VX","VB","VZ","VD"), each = 2),
                  value = runif(8, 5,10),
                  vartype = rep(c("TA","TB"), each = 4))

Note that this way variable and vartype are factors. If they aren't factors, ggplot() will coerce them and then you get left with alphabetical ordering. I have said this before and will no doubt say it again; get your data into the correct format first before you start plotting / doing data analysis.

You want the following ordering:

> with(df2, order(vartype, variable))
[1] 3 4 1 2 7 8 5 6

where you should note that we get the ordering by vartype first and only then by variable within the levels of vartype. If we use this to reorder the levels of variable we get:

> with(df2, reorder(variable, order(vartype, variable)))
[1] VX VX VB VB VZ VZ VD VD
attr(,"scores")
 VB  VD  VX  VZ 
1.5 5.5 3.5 7.5 
Levels: VB VX VD VZ

(ignore the attr(,"scores") bit and focus on the Levels). This has the right ordering, but ggplot() will draw them bottom to top and you wanted top to bottom. I'm not sufficiently familiar with ggplot() to know if this can be controlled, so we will also need to reverse the ordering using decreasing = TRUE in the call to order().

Putting this all together we have:

## reorder `variable` on `variable` within `vartype`
df3 <- transform(df2, variable = reorder(variable, order(vartype, variable,
                                                         decreasing = TRUE)))

Which when used with your plotting code:

ggplot(df3, aes(x=variable, y=value, fill=vartype)) +
       geom_bar() + 
       facet_grid(. ~ year) + 
       coord_flip()

produces this:

reordered barplot

like image 118
Gavin Simpson Avatar answered Oct 11 '22 13:10

Gavin Simpson