Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sort bar chart by sum of values in ggplot

Tags:

r

example data:

player <- c("a", "b", "a", "b", "c", 
            "a", "a", "b", "c", "b", 
            "c", "a", "c", "c", "a")
is.winner <- c(TRUE, FALSE, TRUE, TRUE, TRUE, 
               FALSE, TRUE, TRUE, TRUE, FALSE, 
               TRUE, TRUE, TRUE, TRUE, FALSE)

df <- data.frame(player, is.winner)

my first graph looks something like this

enter image description here

ggplot(data=df, aes(x=player, y=as.numeric(is.winner))) +
  geom_bar(stat="summary", fun.y=sum) + 
  coord_flip()

What I would like to do is sort the df$player axis by the sum of "TRUE" values, so that it looks something like this:

enter image description here

I realize I could use something like this:

df$player <- factor(df$player, levels=c("b", "a", "c"))

But the actual data has far more 'player names'. In addition I would like to something similar with win percentages, etc. So automatic sorting would be great. Example of win percentage below

enter image description here

df$is.winner <- factor(df$is.winner, levels=c("TRUE", "FALSE"))
df$player <- factor(df$player, levels=c("c", "b", "a"))

library(scales)
library(RColorBrewer)
ggplot(data=df, aes(x=player)) +
  geom_bar(aes(fill=is.winner),position='fill')+
  scale_y_continuous(labels=percent)+
  scale_fill_brewer(palette="Set2") +
  coord_flip()
like image 242
tastycanofmalk Avatar asked Nov 09 '15 16:11

tastycanofmalk


1 Answers

You can use reorder which is a function that reorders a factor's levels according to some predicate.

ggplot(data=df, aes(x=reorder(player, is.winner, function(x){ sum(x) }), 
                    y=as.numeric(is.winner))) +
geom_bar(stat="summary", fun.y=sum) +
coord_flip()

enter image description here

reorder(x, X, FUN) takes

  • x, the factor to reorder.
  • X a vector of the same length as x. This vector will be split into subsets for each level and passed to the function FUN.
  • FUN the function to apply to each level's subset. This function should take a vector and return a scalar that will be used to order the factor levels.

In your last example you need to convert the vector to boolean again to be able to sum it up:

df$is.winner <- factor(df$is.winner, levels=c("TRUE", "FALSE"))

ggplot(data=df, aes(x=reorder(player, df$is.winner=="TRUE", sum), fill=is.winner)) +
  geom_bar(position='fill') +
  scale_y_continuous(labels=percent) +
  scale_fill_brewer(palette="Set2") +
  xlab("player") + 
  coord_flip()

enter image description here

like image 130
while Avatar answered Oct 17 '22 05:10

while