Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to separate the two leftmost bins of a histogram in R

Tags:

r

histogram

Suppose I need to plot a dataset like below:

set.seed(1)
dataset <- sample(1:7, 1000, replace=T)
hist(dataset)

As you can see in the plot below, the two leftmost bins do not have any space between them unlike the rest of the bins.

enter image description here

I tried changing xlim, but it didn't work. Basically I would like to have each number (1 to 7) represented as a bin, and additionally, I would like any two adjacent bins to have space beween them...Thanks!

like image 411
Alex Avatar asked Jan 18 '13 04:01

Alex


People also ask

How do you change a break in a histogram in R?

Note that when giving breakpoints, the default for R is that the histogram cells are right-closed (left open) intervals of the form (a,b]. You can change this with the right=FALSE option, which would change the intervals to be of the form [a,b). This is important if you have a lot of points exactly at the breakpoint.

What does breaks do in R histogram?

The breaks argument controls the number of bars, cells or bins of the histogram. By default breaks = "Sturges" . The default method is the most recommended in the most of the cases. If you specify the number of breaks manually make sure the number is not too high.

How do you set the number of bins in a histogram in R?

To change the number of bins in the histogram in Base R Language, we use the breaks argument of the hist() function. The breaks argument of the hist function to increase or decrease the width of our bars by fixing the number of bars, cells, or bins the whole histogram will be divided into.


1 Answers

The best way is to set the breaks argument manually. Using the data from your code,

hist(dataset,breaks=rep(1:7,each=2)+c(-.4,.4))

gives the following plot:

enter image description here

The first part, rep(1:7,each=2), is what numbers you want the bars centered around. The second part controls how wide the bars are; if you change it to c(-.49,.49) they'll almost touch, if you change it to c(-.3,.3) you get narrower bars. If you set it to c(-.5,.5) then R yells at you because you aren't allowed to have the same number in your breaks vector twice.

Why does this work?

If you split up the breaks vector, you get one part that looks like this:

> rep(1:7,each=2)
 [1] 1 1 2 2 3 3 4 4 5 5 6 6 7 7

and a second part that looks like this:

> c(-.4,.4)
 [1] -0.4  0.4

When you add them together, R loops through the second vector as many times as needed to make it as long as the first vector. So you end up with

  1-0.4  1+0.4  2-0.4  2+0.4  3-0.4  3+0.4 [etc.]
=   0.6    1.4    1.6    2.4    2.6    3.4 [etc.]

Thus, you have one bar from 0.6 to 1.4--centered around 1, with width 2*.4--another bar from 1.6 to 2.4 centered around 2 with with 2*.4, and so on. If you had data in between (e.g. 2.5) then the histogram would look kind of silly, because it would create a bar from 2.4 to 2.6, and the bar widths would not be even (since that bar would only be .2 wide, while all the others are .8). But with only integer values that's not a problem.

like image 87
Jonathan Christensen Avatar answered Nov 19 '22 14:11

Jonathan Christensen