I have a data frame which contains x-axis numeric bins and continuous y-axis data across multiple categories. Initially, I created a boxplot by making the x-axis bins "factors", and doing a boxplot of the melted data. Reproducible data:
x <- seq(1,10,by=1)
y1 <- rnorm(10, mean=3)
y2 <- rnorm(10, mean=10)
y3<- rnorm(10, mean=1)
y4<- rnorm(10, mean=8)
y5<- rnorm(10, mean=12)
df <- data.frame(x,y1,y2,y3,y4,y5)
df.m <- melt(df, id="x")
My code to create the x-axis data as a factor:
df.m$x <- as.factor(df.m$x)
My ggplot:
ggplot(df.m, aes(x=x, y=value))+
geom_boxplot(notch=FALSE, outlier.shape=NA, fill="red", alpha=0.1)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
The resulting plot: :
The problem is that I cannot use x-axis numeric spacing because the x-axis is categorized as a factor, which has equal spacing. I want to be able to use something like scale_x_continuous to manipulate the axis breaks and spacing to, say, an interval of 2, rather than a boxplot every 1, but when I try to plot the data with the x-axis "as.numeric", I just get one boxplot of all of the data:
Any suggestions for a way to get this continuous-looking boxplot curve (the first image) while still being able to control the numeric properties of the x-axis? Thanks!
To create a box plot for a continuous variable, first, install the necessary packages for plotting box plots and then create or load the dataset for which we want to plot the box plot. Plot the box plot using geom_boxplot() function like a regular boxplot.
In ggplot2, geom_boxplot() is used to create a boxplot. Let us first create a regular boxplot, for that we first have to import all the required libraries and dataset in use. Then simply put all the attributes to plot by in ggplot() function along with geom_boxplot.
Here is a way using the original data you posted on Google - which actually was much more helpful, IMO.
ggplot(df, aes(x=CH, y=value,group=CH))+
geom_boxplot(notch=FALSE, outlier.shape=NA, fill="red", alpha=0.2)+
scale_x_log10()
So, as @BenBolker said before he deleted his answer(??), you should leave the x-variable (CH
) as numeric, and set group=CH
in the call to aes(...)
.
With your real data there is another problem though. Your CH
is more or less logarithmically spaced, so there are about as many points < 1 as there are between 1 - 10, etc. ggplot
wants to make the boxes all the same size, so with a linear x-axis the box width is smaller than the line width, and you don't see the boxes at all. Changing the x-axis to a logarithmic scale fixes that, more or less.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With