Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add different boxplots to the same plot based on different data sources in ggplot /R?

Tags:

r

ggplot2

Please find My Data below. Please note that picture below is an example of the design I wish to copy and does not correlate to My Data specifically.

My Data is stored in p. I have a continuous covariate p$ki67pro which denominate the percentage of cells actively dividing in a tumor sample (thus, ranging from 0 to 100). I have three different stages of the tumor, which correspond to p$WHO.Grade==1,2,3. Each sample represent a tumor patient that either had recurrence (p$recurrence==1) or not (p$recurrence==0).

Therefore:

head(p)
   WHO.Grade recurrence ki67pro
1          1          0       1
2          2          0      12
3          1          0       3
9          1          0       3
10         1          0       5
11         1          0       3

I wish to produce the boxplot below. As you can see, there are four points which correspond to each p$WHO.Grade and and All samples. There are two boxplots per p$WHO.Grade + All.

enter image description here

Per p$WHO.Grade and All, I want one boxplot to represent p$ki67pro for recurrent tumors (p$recurrence==1) and the other boxplot to represent p$ki67pro for non-recurrent tumors (p$recurrence==0).

I.e.

p$ki67pro[p$WHO.Grade==1 & p$recurrence==0] versus p$ki67pro[p$WHO.Grade==1 & p$recurrence==1]

p$ki67pro[p$WHO.Grade==2 & p$recurrence==0] versus p$ki67pro[p$WHO.Grade==2 & p$recurrence==1]

p$ki67pro[p$WHO.Grade==3 & p$recurrence==0] versus p$ki67pro[p$WHO.Grade==3 & p$recurrence==1]

And for All

p$ki67pro[p$recurrence==0] versus p$ki67pro[p$recurrence==1]

I have used the following script so far, but I can figure out on how to get the All included. Please, note that there is only one case p$WHO.Grade==3

df <- data.frame(x = as.factor(c(p$WHO.Grade)),
                 y = c(p$ki67pro),
                 f = rep(c("ki67pro"), c(nrow(p))))

df <- df[!is.na(df$x),]
ggplot(df) +
  geom_boxplot(aes(x, y, fill = f, colour = f), outlier.alpha = 0, position = position_dodge(width = 0.78)) +
  scale_x_discrete(name = "", label=c("WHO-I","WHO-II","WHO-III","All")) +
  scale_y_continuous(name="x", breaks=seq(0,30,5), limits=c(0,30)) +
  stat_boxplot(aes(x, y, colour = f), geom = "errorbar", width = 0.3,position = position_dodge(0.7753)) +
  geom_point(aes(x, y, fill = f, colour = f), size = 3, shape = 21, position = position_jitterdodge()) +
  scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
                    labels = c("", "")) +
  scale_colour_manual(values = c("#1C73C2", "red"), name = "",
                      labels = c("","")) + theme(legend.position="none")

My Data p

p <- structure(list(WHO.Grade = c(1L, 2L, 1L, 1L, 1L, 1L, 3L, 2L, 
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), recurrence = c(0L, 0L, 0L, 0L, 0L, 
0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L), ki67pro = c(1L, 12L, 
3L, 3L, 5L, 3L, 20L, 25L, 7L, 4L, 5L, 12L, 3L, 15L, 4L, 5L, 7L, 
8L, 3L, 12L, 10L, 4L, 10L, 7L, 3L, 2L, 3L, 7L, 4L, 7L, 10L, 4L, 
5L, 5L, 3L, 5L, 2L, 5L, 3L, 3L, 3L, 4L, 4L, 3L, 2L, 5L, 1L, 5L, 
2L, 3L, 1L, 2L, 3L, 3L, 5L, 4L, 20L, 5L, 0L, 4L, 3L, 0L, 3L, 
4L, 1L, 2L, 20L, 2L, 3L, 5L, 4L, 8L, 1L, 4L, 5L, 4L, 3L, 6L, 
12L, 3L, 4L, 4L, 2L, 5L, 3L, 3L, 3L, 2L, 5L, 4L, 2L, 3L, 4L, 
3L, 3L, 2L, 2L, 4L, 7L, 4L, 3L, 4L, 2L, 3L, 6L, 2L, 3L, 10L, 
5L, 10L, 3L, 10L, 3L, 4L, 5L, 2L, 4L, 3L, 4L, 4L, 4L, 5L, 3L, 
12L, 5L, 4L, 3L, 2L, 4L, 3L, 4L, 2L, 1L, 6L, 1L, 4L, 12L, 3L, 
4L, 3L, 2L, 6L, 5L, 4L, 3L, 4L, 4L, 4L, 3L, 5L, 4L, 5L, 4L, 1L, 
3L, 3L, 4L, 0L, 3L)), class = "data.frame", row.names = c(1L, 
2L, 3L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 18L, 19L, 20L, 
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 
34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 44L, 45L, 46L, 47L, 48L, 
49L, 50L, 51L, 52L, 53L, 54L, 55L, 57L, 59L, 60L, 61L, 62L, 63L, 
64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 
77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 87L, 89L, 90L, 91L, 
92L, 93L, 94L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 103L, 104L, 
105L, 106L, 107L, 109L, 110L, 111L, 112L, 113L, 114L, 115L, 116L, 
117L, 118L, 119L, 120L, 121L, 123L, 124L, 125L, 126L, 127L, 128L, 
130L, 131L, 132L, 133L, 134L, 135L, 136L, 137L, 138L, 139L, 140L, 
141L, 142L, 143L, 144L, 145L, 146L, 147L, 148L, 149L, 150L, 151L, 
152L, 153L, 154L, 155L, 156L, 157L, 158L, 159L, 160L, 161L, 162L, 
163L, 164L, 165L, 166L, 167L, 168L, 169L, 170L, 171L, 172L, 173L, 
174L, 175L))
like image 275
cmirian Avatar asked Dec 10 '22 02:12

cmirian


2 Answers

A trick that can be used is to create a new level in WHO.Grade, since it only has 3 levels. This should be a temporary level, so a good way of doing it is with package dplyr, function mutate.

Note that there is no need to create a new dataframe df.

library(ggplot2)
library(dplyr)

p %>%
  bind_rows(p %>% mutate(WHO.Grade = 4)) %>%
  mutate(WHO.Grade = factor(WHO.Grade),
         recurrence = factor(recurrence)) %>%
  ggplot(aes(WHO.Grade, ki67pro, 
             fill = recurrence, colour = recurrence)) +
  geom_boxplot(outlier.alpha = 0, 
               position = position_dodge(width = 0.78, preserve = "single")) +
  geom_point(size = 3, shape = 21, 
             position = position_jitterdodge()) +
  scale_x_discrete(name = "", 
                   label = c("WHO-I","WHO-II","WHO-III","All")) +
  scale_y_continuous(name = "x", breaks=seq(0,30,5), limits=c(0,30)) +
  scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
                    labels = c("", "")) +
  scale_colour_manual(values = c("#1C73C2", "red"), name = "",
                      labels = c("","")) + 
  theme(legend.position="none")

enter image description here

like image 168
Rui Barradas Avatar answered Dec 12 '22 14:12

Rui Barradas


What about something like this:

# here you duplicate your original data
p1 <- p
# how to catch the all
p1$WHO.Grade <- 'all'
p <- rbind(p1,p)

library(ggplot2)
ggplot(p) +
geom_boxplot(aes(as.factor(WHO.Grade),
                  y = ki67pro,
                  fill = factor(recurrence) ,
                  color = factor(recurrence) ),
             outlier.alpha = 0 , position = position_dodge(width = 0.78)) +
# from here it's more or less your code
scale_x_discrete(name = "", label=c("WHO-I","WHO-II","WHO-III","All")) +
scale_y_continuous(name="x", breaks=seq(0,30,5), limits=c(0,30)) +
stat_boxplot(aes(as.factor(WHO.Grade),
                  y = ki67pro,
                  color = factor(recurrence) ),
              geom = "errorbar", width = 0.3,position = position_dodge(0.7753)) +
geom_point(aes(as.factor(WHO.Grade),
               y = ki67pro,
              color = factor(recurrence) ),
           size = 3, shape = 21, position = position_jitterdodge()) +
scale_fill_manual(values = c("#edf1f9", "#fcebeb"), name = "",
                    labels = c("", "")) +
scale_colour_manual(values = c("#1C73C2", "red"), name = "",
                      labels = c("","")) + 
theme(legend.position="none",
      panel.background = element_blank(),
      axis.line = element_line(colour = "black")) 

enter image description here

like image 22
s__ Avatar answered Dec 12 '22 15:12

s__