Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Consistent width of boxplots if missing data by group?

Tags:

r

ggplot2

boxplot

I have a similar question previously discussed for barplots, but with missing solution for boxplots: Consistent width for geom_bar in the event of missing data

I would like to produce a boxplots by groups. However, data for some groups can be missing, leading to increased width of boxplots with missing groups.

I tried to specify geom_boxplot(width = value) or geom_boxplot(varwidth = F), but this does not work.

Also, as suggested for barplots example, I tried to add NA values for missing data group. Boxplot just only skipp missing data, and extent the boxplot width. I got back the warning:

Warning messages:
1: Removed 1 rows containing non-finite values (stat_boxplot). 

Dummy example:

# library
library(ggplot2)

# create a data frame
variety=rep(LETTERS[1:7], each=40)
treatment=rep(c("high","low"),each=20)
note=seq(1:280)+sample(1:150, 280, replace=T)

# put data together
data=data.frame(variety, treatment ,  note)

ggplot(data, aes(x=variety, y=note, fill=treatment)) + 
  geom_boxplot()

Boxplots have the same width if there are values for each group:

boxplots have the same width is there are values for each group

Remove the values for 1 group:

# subset the data to have a missing data for group:
data.sub<-subset(data, treatment != "high" | variety != "E" )

windows(4,3)
ggplot(data.sub, aes(x=variety, y=note, fill=treatment)) + 
  geom_boxplot()

Boxplot with missing data is wider than another ones:

enter image description here


Is there a way how to keep constant width of boxplots?

like image 860
maycca Avatar asked Aug 31 '18 21:08

maycca


People also ask

How do you interpret Boxplot results?

The median (middle quartile) marks the mid-point of the data and is shown by the line that divides the box into two parts. Half the scores are greater than or equal to this value and half are less. The middle “box” represents the middle 50% of scores for the group.

What does a box plot show?

A box and whisker plot—also called a box plot—displays the five-number summary of a set of data. The five-number summary is the minimum, first quartile, median, third quartile, and maximum. In a box plot, we draw a box from the first quartile to the third quartile. A vertical line goes through the box at the median.

How do you read a Boxplot in Python?

How to interpret the box plot? The bottom of the (green) box is the 25% percentile and the top is the 75% percentile value of the data. So, essentially the box represents the middle 50% of all the datapoints which represents the core region when the data is situated.


1 Answers

We can make use of the preserve argument in position_dodge.

From ?position_dodge

preserve: Should dodging preserve the total width of all elements at a position, or the width of a single element?

ggplot(data.sub, aes(x=variety, y=note, fill=treatment)) + 
 geom_boxplot(position = position_dodge(preserve = "single"))

enter image description here

like image 194
markus Avatar answered Oct 24 '22 15:10

markus