Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use percentage as label in stacked bar plot?

Tags:

r

ggplot2

I'm trying to display percentage numbers as labels inside the bars of a stacked bar plot in ggplot2. I found some other post from 3 years ago but I'm not able to reproduce it: How to draw stacked bars in ggplot2 that show percentages based on group?

The answer to that post is almost exactly what I'm trying to do.

Here is a simple example of my data:

df = data.frame('sample' = c('cond1','cond1','cond1','cond2','cond2','cond2','cond3','cond3','cond3','cond4','cond4','cond4'),
                'class' = c('class1','class2','class3','class1','class2','class3','class1','class2','class3','class1','class2','class3'))
ggplot(data=df, aes(x=sample, fill=class)) + 
    coord_flip() +
    geom_bar(position=position_fill(reverse=TRUE), width=0.7)

enter image description here

I'd like for every bar to show the percentage/fraction, so in this case they would all be 33%. In reality it would be nice if the values would be calculated on the fly, but I can also hand the percentages manually if necessary. Can anybody help?

Side question: How can I reduce the space between the bars? I found many answers to that as well but they suggest using the width parameter in position_fill(), which doesn't seem to exist anymore.

Thanks so much!

EDIT:

So far, there are two examples that show exactly what I was asking for (big thanks for responding so quickly), however they fail when applying it to my real data. Here is the example data with just another element added to show what happens:

df = data.frame('sample' = c('cond1','cond1','cond1','cond2','cond2','cond2','cond3','cond3','cond3','cond4','cond4','cond4','cond1'),
                'class' = c('class1','class2','class3','class1','class2','class3','class1','class2','class3','class1','class2','class3','class2'))

Essentially, I'd like to have only one label per class/condition combination.

like image 452
fakechek Avatar asked Feb 04 '23 15:02

fakechek


2 Answers

I think what OP wanted was labels on the actual sections of the bars. We can do this using data.table to get the count percentages and the formatted percentages and then plot using ggplot:

library(data.table)
library(scales)
dt <- setDT(df)[,list(count = .N), by = .(sample,class)][,list(class = class, count = count,
                percent_fmt = paste0(formatC(count*100/sum(count), digits = 2), "%"),
                percent_num = count/sum(count)
                ), by = sample]

ggplot(data=dt, aes(x=sample, y= percent_num, fill=class)) +   
  geom_bar(position=position_fill(reverse=TRUE), stat = "identity", width=0.7) +
  geom_text(aes(label = percent_fmt),position = position_stack(vjust = 0.5)) + coord_flip()

enter image description here

Edit: Another solution where you calculate the y-value of your label in the aggregate. This is so we don't have to rely on position_stack(vjust = 0.5):

dt <- setDT(df)[,list(count = .N), by = .(sample,class)][,list(class = class, count = count,
               percent_fmt = paste0(formatC(count*100/sum(count), digits = 2), "%"),
               percent_num = count/sum(count),
               cum_pct = cumsum(count/sum(count)),
               label_y = (cumsum(count/sum(count)) + cumsum(ifelse(is.na(shift(count/sum(count))),0,shift(count/sum(count))))) / 2
), by = sample]

ggplot(data=dt, aes(x=sample, y= percent_num, fill=class)) +   
  geom_bar(position=position_fill(reverse=TRUE), stat = "identity", width=0.7) +
  geom_text(aes(label = percent_fmt, y = label_y)) + coord_flip()
like image 59
Mike H. Avatar answered Feb 08 '23 06:02

Mike H.


Here is a solution where you first calculate the percentages using dplyr and then plot them:

UPDATED:

options(stringsAsFactors = F)

df = data.frame(sample = c('cond1','cond1','cond1','cond2','cond2','cond2','cond3','cond3','cond3','cond4','cond4','cond4'), 
                class = c('class1','class2','class3','class1','class2','class3','class1','class2','class3','class1','class2','class3'))

library(dplyr) 
library(scales)

df%>%
  # count how often each class occurs in each sample.
  count(sample, class)%>% 
  group_by(sample)%>%
  mutate(pct = n / sum(n))%>%
  ggplot(aes(x = sample, y = pct, fill = class)) + 
  coord_flip() +
  geom_col(width=0.7)+
  geom_text(aes(label = paste0(round(pct * 100), '%')),
            position = position_stack(vjust = 0.5))

enter image description here

like image 32
Jeroen Boeye Avatar answered Feb 08 '23 06:02

Jeroen Boeye