Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Histogram function in R - breaks argument not working

Tags:

plot

r

histogram

I like to create histogram objects per row with defined breaks:

Hist <- list()
for (i in 1:10) { 
   Hist[[i]] <- hist(data[i,],breaks=25)
}

But there is an indifference between my requirement of breaks and the number of breaks which is in the output. And also the number of breaks within the histograms is different.

Is there a reason for that?

like image 390
inalei Avatar asked Mar 21 '16 15:03

inalei


People also ask

What is the default break for a histogram in R?

Behind the scenes, R can use one of three algorithms when choosing the breaks for a histogram: Sturges, Scott or Freedman-Diaconis. According to the help file for hist: "The default for breaks is "Sturges": see nclass.Sturges.

How do you find the number of breaks in a histogram?

Histogram breaks in R. breaks argument. Plug in selection. The hist function uses the Sturges method by default to determine the number of breaks on the histogram. This selection is very important because too many bins will increase the variability and few bins will group the data too much.

How do you use Sturges rule in R histogram?

If you use the hist () function in R, Sturges’ Rule will be used to automatically choose the number of bins to display in the histogram. Even if you use the breaks argument to specify a different number of bins to use, R will only use this as a “suggestion” for how many bins to use.

What is an equi-spaced histogram in R?

logical, indicating if the distances between breaks are all the same. The definition of histogram differs by source (with country-specific biases). R 's default with equi-spaced breaks (also the default) is to plot the counts in the cells defined by breaks.


1 Answers

To get consistent breaks, specify a vector. Not an integer, as you might have expected!

Yes there is a reason ;) From the histogram help page: ?hist:

`breaks` can be one of:
  
  1. a vector giving the breakpoints between histogram cells,
  2. a function to compute the vector of breakpoints,
  3. a single number giving the number of cells for the histogram,
  4. a character string naming an algorithm to compute the number of cells (see ‘Details’),
  5. a function to compute the number of cells.

In cases 3,4,5 the number is only a suggestion; the breakpoints will be set to pretty values. If breaks is a function, the x vector is supplied to it as the only argument.


Note the part I've highlighted in bold.

like image 110
csgillespie Avatar answered Oct 02 '22 23:10

csgillespie