Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a bin for anything above X value in GGPlot2 Histogram

Using ggplot2, I want to create a histogram where anything above X is grouped into the final bin. For example, if most of my distribution was between 100 and 200, and I wanted to bin by 10, I would want anything above 200 to be binned in "200+".

# create some fake data    
id <- sample(1:100000, 10000, rep=T)
visits <- sample(1:1200,10000, rep=T)

#merge to create a dataframe
df <- data.frame(cbind(id,visits))

#plot the data
hist <- ggplot(df, aes(x=visits)) + geom_histogram(binwidth=50)

How can I limit the X axis, while still representing the data I want limit?

like image 824
mikebmassey Avatar asked Jul 23 '12 17:07

mikebmassey


People also ask

How do I specify a bin in ggplot2?

Specify Bins The default number of bins in ggplot2 is 30 . You can modify the number of bins using the bins argument. In the below example, we create a histogram with 7 bins.

How is it possible to change the number of bins in a Ggplot histogram?

To change the number of bins in the histogram using the ggplot2 package library in the R Language, we use the bins argument of the geom_histogram() function. The bins argument of the geom_histogram() function to manually set the number of bars, cells, or bins the whole histogram will be divided into.

What is Binwidth in histogram?

The towers or bars of a histogram are called bins. The height of each bin shows how many values from that data fall into that range. Width of each bin is = (max value of data – min value of data) / total number of bins.

Can you build a histogram using ggplot2?

You can also make histograms by using ggplot2 , “a plotting system for R, based on the grammar of graphics” that was created by Hadley Wickham. This post will focus on making a Histogram With ggplot2.


1 Answers

Perhaps you're looking for the breaks argument for geom_histogram:

# create some fake data    
id <- sample(1:100000, 10000, rep=T)
visits <- sample(1:1200,10000, rep=T)

#merge to create a dataframe
df <- data.frame(cbind(id,visits))

#plot the data
require(ggplot2)
ggplot(df, aes(x=visits)) +
  geom_histogram(breaks=c(seq(0, 200, by=10), max(visits)), position = "identity") +
  coord_cartesian(xlim=c(0,210))

This would look like this (with the caveats that the fake data looks pretty bad here and the axis need to be adjusted as well to match the breaks):

manual breaks on histogram

Edit:

Maybe someone else can weigh in here:

# create breaks and labels
brks <- c(seq(0, 200, by=10), max(visits))
lbls <- c(as.character(seq(0, 190, by=10)), "200+", "")
# true
length(brks)==length(lbls)

# hmmm
ggplot(df, aes(x=visits)) +
  geom_histogram(breaks=brks, position = "identity") +
  coord_cartesian(xlim=c(0,220)) +
  scale_x_continuous(labels=lbls)

The plot errors with:

Error in scale_labels.continuous(scale) : 
  Breaks and labels are different lengths

Which looks like this but that was fixed 8 months ago.

like image 57
mindless.panda Avatar answered Sep 22 '22 03:09

mindless.panda