Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a different color scale for each bar in a ggplot2 stacked bar graph

I have a stacked bar chart that looks like this:

Number of patients on each drug by drug class

While the colors look nice, it is confusing to have so many similar colors representing different drugs. I would like to have a separate color palette for each bar in the graph, for example, class1 could be use the palette "Blues" while class2 could use the palette "BuGn" (color palette names found here)

I have found some instances in which people manually coded colors for each bar (such as here), but I'm not sure if what I'm asking is possible - these bars would need to be based on palettes, since there are so many drugs in each drug class.

Code to create the above graph:

library(ggplot2)
library(plyr)
library(RColorBrewer)

drug_name <- c("a", "a", "b", "b", "b", "c", "d", "e", "e", "e", "e", "e", "e",
           "f", "f", "g", "g", "g", "g", "h", "i", "j", "j", "j", "k", "k",
           "k", "k", "k", "k", "l", "l", "m", "m", "m", "n", "o")
df <- data.frame(drug_name)

#get the frequency of each drug name
df_count <- count(df, 'drug_name')

#add a column that specifies the drug class
df_count$drug_class <- vector(mode='character', length=nrow(df_count))

df_count$drug_class[df_count$drug_name %in% c("a", "c", "e", "f")] <- 'class1'

df_count$drug_class[df_count$drug_name %in% c("b", "o")] <- 'class2'

df_count$drug_class[df_count$drug_name %in% c("d", "h", "i")] <- 'class3'

df_count$drug_class[df_count$drug_name %in% c("g", "j", "k", "l", "m", "n")] <- 'class4'

#expand color palette (from http://novyden.blogspot.com/2013/09/how-to-expand-color-palette-with-ggplot.html)

colorCount = length(unique(df_count$drug_name))
getPalette = colorRampPalette(brewer.pal(9, "Set1"))

test_plot <- ggplot(data = df_count, aes(x=drug_class, y=freq, fill=drug_name) ) + geom_bar(stat="identity") + scale_fill_manual(values=getPalette(colorCount))

test_plot
like image 744
epi_n00b Avatar asked Mar 11 '16 18:03

epi_n00b


1 Answers

With so many colors, your plot is going to be confusing. It's probably better to just label each bar section with the drug name and the count. The code below shows one way to make separate palettes for each bar and also how to label the bars.

First, add a column that we'll use for positioning the bar labels:

library(dplyr) # for the chaining (%>%) operator

## Add a column for positioning drug labels on graph
df_count = df_count %>% group_by(drug_class) %>%
  mutate(cum.freq = cumsum(freq) - 0.5*freq)

Second, create the palettes. The code below uses four different Colorbrewer palettes, but you can use any combination of palette-creating functions or methods to control the colors as finely as you wish.

## Create separate palette for each drug class

# Count the number of colors we'll need for each bar
ncol = table(df_count$drug_class)

# Make the palettes
pal = mapply(function(x,y) brewer.pal(x,y), ncol, c("BrBG","OrRd","YlGn","Set2"))
pal[[2]] = pal[[2]][1:2]  # We only need 2 colors but brewer.pal creates 3 minimum
pal = unname(unlist(pal)) # Combine palettes into single vector of colors

ggplot(data = df_count, aes(x=drug_class, y=freq, fill=drug_name) ) + 
  geom_bar(stat="identity", colour="black", lwd=0.2) + 
  geom_text(aes(label=paste0(drug_name,": ", freq), y=cum.freq), colour="grey20") +
  scale_fill_manual(values=pal) +
  guides(fill=FALSE)

enter image description here

There are many strategies and functions for creating color palettes. Here's another method, using the hcl function:

lum = seq(100, 50, length.out=4)    # Vary the luminance for each bar
shift = seq(20, 60, length.out=4)  # Shift the hues for each bar

pal2 = mapply(function(n, l, s) hcl(seq(0 + s, 360 + s, length.out=n+1)[1:n], 100, l), 
              ncol, lum, shift)
pal2 = unname(unlist(pal2))
like image 86
eipi10 Avatar answered Sep 25 '22 19:09

eipi10