Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Producing a boxplot in ggplot2 using summary statistics

Tags:

r

ggplot2

boxplot

Below is a code for producing a boxplot using ggplot2 I'm trying to modify in order to suit my problem:

library(ggplot2)
set.seed(1)
# create fictitious data
a <- rnorm(10)
b <- rnorm(12)
c <- rnorm(7)
d <- rnorm(15)

# data groups
group <- factor(rep(1:4, c(10, 12, 7, 15)))

# dataframe
mydata <- data.frame(c(a,b,c,d), group)
names(mydata) <- c("value", "group")

# function for computing mean, DS, max and min values
min.mean.sd.max <- function(x) {
  r <- c(min(x), mean(x) - sd(x), mean(x), mean(x) + sd(x), max(x))
  names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
  r
}

# ggplot code
p1 <- ggplot(aes(y = value, x = factor(group)), data = mydata)
p1 <- p1 + stat_summary(fun.data = min.mean.sd.max, geom = "boxplot") + ggtitle("Boxplot con media, 95%CI, valore min. e max.") + xlab("Gruppi") + ylab("Valori")

In my case I do not have the actual data points but rather only their mean and standard deviation (the data are normally distributed). So for this example it will be:

mydata.mine = data.frame(mean = c(mean(a),mean(b),mean(c),mean(d)),sd = c(sd(a),sd(b),sd(c),sd(d)),group = c(1,2,3,4))

However I would still like to produce a boxplot. I thought of defining: ymin = mean - 3*sd lower = mean - sd mean = mean upper = mean + sd
ymax = mean + 3*sd

but I don't know how to define a function that will access mean and sd of mydata.mine from fun.data in stat_summary. Alternatively, I can just use rnorm to draw points from a normal parameterized by the mean and sd I have, but the first option seems to me a bit more elegant and simple.

like image 913
user1701545 Avatar asked Mar 06 '14 01:03

user1701545


People also ask

How can you create a boxplot using ggplot2?

In ggplot2, geom_boxplot() is used to create a boxplot. Let us first create a regular boxplot, for that we first have to import all the required libraries and dataset in use. Then simply put all the attributes to plot by in ggplot() function along with geom_boxplot.

What statistical summaries are found in a boxplot?

A box and whisker plot—also called a box plot—displays the five-number summary of a set of data. The five-number summary is the minimum, first quartile, median, third quartile, and maximum.

What do Ggplot boxplots show?

The boxplot compactly displays the distribution of a continuous variable. It visualises five summary statistics (the median, two hinges and two whiskers), and all "outlying" points individually.


1 Answers

ggplot(mydata.mine, aes(x = as.factor(group))) +
  geom_boxplot(aes(
      lower = mean - sd, 
      upper = mean + sd, 
      middle = mean, 
      ymin = mean - 3*sd, 
      ymax = mean + 3*sd),
    stat = "identity")

enter image description here

like image 115
Christie Haskell Marsh Avatar answered Sep 18 '22 12:09

Christie Haskell Marsh