Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create geom_boxplot with large amount of continuous x-variables

Tags:

r

ggplot2

boxplot

I have a data frame which contains x-axis numeric bins and continuous y-axis data across multiple categories. Initially, I created a boxplot by making the x-axis bins "factors", and doing a boxplot of the melted data. Reproducible data:

x <- seq(1,10,by=1)
y1 <- rnorm(10, mean=3)
y2 <- rnorm(10, mean=10)
y3<- rnorm(10, mean=1)
y4<- rnorm(10, mean=8)
y5<- rnorm(10, mean=12)
df <- data.frame(x,y1,y2,y3,y4,y5)
df.m <- melt(df, id="x")

My code to create the x-axis data as a factor:

df.m$x <- as.factor(df.m$x)

My ggplot:

ggplot(df.m, aes(x=x, y=value))+
 geom_boxplot(notch=FALSE, outlier.shape=NA, fill="red", alpha=0.1)+
 theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

The resulting plot: The resulting plot:

The problem is that I cannot use x-axis numeric spacing because the x-axis is categorized as a factor, which has equal spacing. I want to be able to use something like scale_x_continuous to manipulate the axis breaks and spacing to, say, an interval of 2, rather than a boxplot every 1, but when I try to plot the data with the x-axis "as.numeric", I just get one boxplot of all of the data:

plot

Any suggestions for a way to get this continuous-looking boxplot curve (the first image) while still being able to control the numeric properties of the x-axis? Thanks!

like image 732
AndMan21 Avatar asked Nov 20 '14 22:11

AndMan21


People also ask

Can you make a Boxplot from continuous data?

To create a box plot for a continuous variable, first, install the necessary packages for plotting box plots and then create or load the dataset for which we want to plot the box plot. Plot the box plot using geom_boxplot() function like a regular boxplot.

How can you create a Boxplot using ggplot2?

In ggplot2, geom_boxplot() is used to create a boxplot. Let us first create a regular boxplot, for that we first have to import all the required libraries and dataset in use. Then simply put all the attributes to plot by in ggplot() function along with geom_boxplot.


1 Answers

Here is a way using the original data you posted on Google - which actually was much more helpful, IMO.

ggplot(df, aes(x=CH, y=value,group=CH))+
  geom_boxplot(notch=FALSE, outlier.shape=NA, fill="red", alpha=0.2)+
  scale_x_log10()

So, as @BenBolker said before he deleted his answer(??), you should leave the x-variable (CH) as numeric, and set group=CH in the call to aes(...).

With your real data there is another problem though. Your CH is more or less logarithmically spaced, so there are about as many points < 1 as there are between 1 - 10, etc. ggplot wants to make the boxes all the same size, so with a linear x-axis the box width is smaller than the line width, and you don't see the boxes at all. Changing the x-axis to a logarithmic scale fixes that, more or less.

like image 52
jlhoward Avatar answered Oct 04 '22 16:10

jlhoward