Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Individual binwidths in faceted histogram on ggplot2

Tags:

r

ggplot2

I do a series of histograms with facet_grid and I want every histogram in the grid to have the same number of classes, in the example below e.g. 6 classes. The problem in this example below is that binwidth = diff(range(x$data))/6) defines the classes according to the overall range of a, b and c, i.e. defines one binwidth for all three facets.

How do I define binwidth individually for the facets a, b and c?

require("ggplot2")

a <- c(1.21,1.57,1.21,0.29,0.36,0.29,0.93,0.26,0.28,0.48,
       0.12,0.38,0.83,0.82,0.41,0.69,0.25,0.98,0.52,0.11)
b <- c(0.42,0.65,0.17,0.38,0.44,0.01,0.01,0.03,0.15,0.01)
c <- c(1.09,3.55,1.07,4.55,0.55,0.11,0.72,0.66,1.22,3.04,
       2.01,0.64,0.47,1.33,3.44)

x <- data.frame(data = c(a,b,c), variable = c(rep("a",20),rep("b",10),rep("c",15)),area="random")

qplot(data, data = x, geom = "histogram", binwidth = diff(range(x$data))/6) +
  facet_grid(area~variable, scales = "free")
like image 401
Ben Avatar asked Jul 04 '14 09:07

Ben


People also ask

Can you build a histogram using ggplot2?

You can also make histograms by using ggplot2 , “a plotting system for R, based on the grammar of graphics” that was created by Hadley Wickham. This post will focus on making a Histogram With ggplot2.

How is it possible to change the number of bins in a Ggplot histogram?

You can modify the number of bins using the bins argument. In the below example, we create a histogram with 7 bins.

What ggplot2 function could be used to Visualise the distribution of a single numeric variable?

Visualise the distribution of a single continuous variable by dividing the x axis into bins and counting the number of observations in each bin. Histograms ( geom_histogram() ) display the counts with bars; frequency polygons ( geom_freqpoly() ) display the counts with lines.

What is Binwidth in histogram?

The towers or bars of a histogram are called bins. The height of each bin shows how many values from that data fall into that range. Width of each bin is = (max value of data – min value of data) / total number of bins. The default value of the number of bins to be created in a histogram is 10.


2 Answers

This is not optimal but you can do the histogram in different layers:

ggplot(x, aes(x=data)) +
   geom_histogram(data=subset(x, variable=="a"), binwidth=.1) +
   geom_histogram(data=subset(x, variable=="b"), binwidth=.2) +
   geom_histogram(data=subset(x, variable=="c"), binwidth=.5) +
   facet_grid(area~variable, scales="free")
like image 121
jenswirf Avatar answered Oct 13 '22 09:10

jenswirf


One way is to pre-summarize your data in the way you want it, then to create the plot.

In your case, you need to bin your variables using the function cut(). The package dplyr is convenient for this, because it allows you to specify a mutate function for each group of your data:

library(dplyr)

zz <- x %>%
  group_by(variable) %>%
  mutate(
    bins = cut(data, breaks=6)
  )

qplot(bins, data = zz, geom = "histogram", fill=I("blue")) +
  facet_grid(area~variable, scales = "free") +
  theme(axis.text.x = element_text(angle=90))

enter image description here

like image 22
Andrie Avatar answered Oct 13 '22 10:10

Andrie