I'm working on trying to make a boxplot in R-cran that is categorized by two different factors on the x-axis. My problem lies in creating labels for one factor with +20 levels that spans the entire graph appropriately while using a legend to label the second factor which has only 2 to 3 levels.
Here is a test script that roughly mimics my actual dataset:
d<-data.frame(x=rnorm(1500),f1=rep(seq(1:20),75),f2=rep(letters[1:3],500))
# first factor has 20+ levels
d$f1<-factor(d$f1)
# second factor a,b,c
d$f2<-factor(d$f2)
boxplot(x~f2*f1,data=d,col=c("red","blue","green"),frame.plot=TRUE,axes=FALSE)
# y axis is numeric and works fine
yts=pretty(d$x,n=5)
axis(2,yts)
# I know this doesn't work; what I'd like is to spread the factors out
# so the each group of three(a,b,c) is labeled correctly
axis(1,at=seq(1:20))
# Use the legend to handle the f2 factor labels
legend(1, max(d$x), c("a", "b","c"),fill = c("red", "blue","green"))
Thanks for any help
To create a boxplot, we have one factor and one numerical column and the boxplot is created for each category or levels in that factor. Now if we have two factors then the boxplot can be created for both factor levels by passing fill argument in geom_boxplot.
Box plot for multiple groups In order to create a box plot by group in R you can pass a formula of the form y ~ x , being x a numerical variable and y a categoriacal variable to the boxplot function. You can pass the variables accessing the data from the data frame using the dollar sign or subsetting the data frame.
Adding Labels We can add labels using the xlab,ylab parameters in the boxplot() function. By using the main parameter, we can add heading to the plot. Notch parameter is used to make the plot more understandable.
FWIW, a ggplot2
solution:
library(ggplot2)
ggplot(data = d, aes(x = f1, y = x)) +
geom_boxplot(aes(fill = f2), width = 0.8) + theme_bw()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With