Suppose I need to plot a dataset like below:
set.seed(1)
dataset <- sample(1:7, 1000, replace=T)
hist(dataset)
As you can see in the plot below, the two leftmost bins do not have any space between them unlike the rest of the bins.
I tried changing xlim, but it didn't work. Basically I would like to have each number (1 to 7) represented as a bin, and additionally, I would like any two adjacent bins to have space beween them...Thanks!
Note that when giving breakpoints, the default for R is that the histogram cells are right-closed (left open) intervals of the form (a,b]. You can change this with the right=FALSE option, which would change the intervals to be of the form [a,b). This is important if you have a lot of points exactly at the breakpoint.
The breaks argument controls the number of bars, cells or bins of the histogram. By default breaks = "Sturges" . The default method is the most recommended in the most of the cases. If you specify the number of breaks manually make sure the number is not too high.
To change the number of bins in the histogram in Base R Language, we use the breaks argument of the hist() function. The breaks argument of the hist function to increase or decrease the width of our bars by fixing the number of bars, cells, or bins the whole histogram will be divided into.
The best way is to set the breaks
argument manually. Using the data from your code,
hist(dataset,breaks=rep(1:7,each=2)+c(-.4,.4))
gives the following plot:
The first part, rep(1:7,each=2)
, is what numbers you want the bars centered around. The second part controls how wide the bars are; if you change it to c(-.49,.49)
they'll almost touch, if you change it to c(-.3,.3)
you get narrower bars. If you set it to c(-.5,.5)
then R yells at you because you aren't allowed to have the same number in your breaks
vector twice.
Why does this work?
If you split up the breaks vector, you get one part that looks like this:
> rep(1:7,each=2)
[1] 1 1 2 2 3 3 4 4 5 5 6 6 7 7
and a second part that looks like this:
> c(-.4,.4)
[1] -0.4 0.4
When you add them together, R loops through the second vector as many times as needed to make it as long as the first vector. So you end up with
1-0.4 1+0.4 2-0.4 2+0.4 3-0.4 3+0.4 [etc.]
= 0.6 1.4 1.6 2.4 2.6 3.4 [etc.]
Thus, you have one bar from 0.6 to 1.4--centered around 1, with width 2*.4--another bar from 1.6 to 2.4 centered around 2 with with 2*.4, and so on. If you had data in between (e.g. 2.5) then the histogram would look kind of silly, because it would create a bar from 2.4 to 2.6, and the bar widths would not be even (since that bar would only be .2 wide, while all the others are .8). But with only integer values that's not a problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With