I have a similar question previously discussed for barplots, but with missing solution for boxplots: Consistent width for geom_bar in the event of missing data
I would like to produce a boxplots by groups. However, data for some groups can be missing, leading to increased width of boxplots with missing groups.
I tried to specify geom_boxplot(width = value
) or geom_boxplot(varwidth = F)
, but this does not work.
Also, as suggested for barplots example, I tried to add NA
values for missing data group. Boxplot just only skipp missing data, and extent the boxplot width. I got back the warning:
Warning messages:
1: Removed 1 rows containing non-finite values (stat_boxplot).
Dummy example:
# library
library(ggplot2)
# create a data frame
variety=rep(LETTERS[1:7], each=40)
treatment=rep(c("high","low"),each=20)
note=seq(1:280)+sample(1:150, 280, replace=T)
# put data together
data=data.frame(variety, treatment , note)
ggplot(data, aes(x=variety, y=note, fill=treatment)) +
geom_boxplot()
Boxplots have the same width if there are values for each group:
Remove the values for 1 group:
# subset the data to have a missing data for group:
data.sub<-subset(data, treatment != "high" | variety != "E" )
windows(4,3)
ggplot(data.sub, aes(x=variety, y=note, fill=treatment)) +
geom_boxplot()
Boxplot with missing data is wider than another ones:
Is there a way how to keep constant width of boxplots?
The median (middle quartile) marks the mid-point of the data and is shown by the line that divides the box into two parts. Half the scores are greater than or equal to this value and half are less. The middle “box” represents the middle 50% of scores for the group.
A box and whisker plot—also called a box plot—displays the five-number summary of a set of data. The five-number summary is the minimum, first quartile, median, third quartile, and maximum. In a box plot, we draw a box from the first quartile to the third quartile. A vertical line goes through the box at the median.
How to interpret the box plot? The bottom of the (green) box is the 25% percentile and the top is the 75% percentile value of the data. So, essentially the box represents the middle 50% of all the datapoints which represents the core region when the data is situated.
We can make use of the preserve
argument in position_dodge
.
From ?position_dodge
preserve: Should dodging preserve the total width of all elements at a position, or the width of a single element?
ggplot(data.sub, aes(x=variety, y=note, fill=treatment)) +
geom_boxplot(position = position_dodge(preserve = "single"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With