Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding hist() and break intervals in R [duplicate]

Tags:

plot

r

histogram

I've recently started using R and I don't think I'm understanding the hist() function well. I'm currently working with a numeric vector of length 296, and I'd like to divide it up into 10 equal intervals, and produce a frequency histogram to see which values fall into each interval. I thought hist(dataset, breaks = 10) would do the job, but it's dividing it into 12 intervals instead. I obviously misunderstood what breaks does.

If I want to divide up my data into 10 intervals in my histogram, how should I go about doing that? Thank you.

like image 697
Bobby Avatar asked Mar 22 '18 21:03

Bobby


People also ask

What does breaks mean in hist R?

The breaks argument controls the number of bars, cells or bins of the histogram. By default breaks = "Sturges" . Sturges method (default) The default method is the most recommended in the most of the cases.

What does the function hist () do in R?

Description. The generic function hist computes a histogram of the given data values. If plot = TRUE , the resulting object of class "histogram" is plotted by plot. histogram , before it is returned.

What does breaks mean in histogram?

It shows the breaks, which are the cutoff points for the bins. It shows the counts, intensity/density for each bin (same thing but two different names for R version compatibility), the midpoints of each bin, and then the name of the variable, whether the bins are equidistant, and the class of the object.


2 Answers

As per the documentation, if you give the breaks argument a single number, it is treated as a suggestion as it gives pretty breakpoints. If you want to force it to be 10 equally spaced bins, the easiest is probably the following,

x = rnorm(50)
hist(x, breaks = seq(min(x), max(x), length.out = 11))

The length should be n+1 where n is the number of desired bins.

like image 92
ClancyStats Avatar answered Oct 10 '22 22:10

ClancyStats


If you read help(hist) you will find this explanation:

breaks: one of:

• a vector giving the breakpoints between histogram cells,

• a function to compute the vector of breakpoints,

• a single number giving the number of cells for the histogram,

• a character string naming an algorithm to compute the number of cells (see ‘Details’),

• a function to compute the number of cells.

In the last three cases the number is a suggestion only; as the breakpoints will be set to ‘pretty’ values, the number is limited to ‘1e6’ (with a warning if it was larger). If ‘breaks’ is a function, the ‘x’ vector is supplied to it as the only argument (and the number of breaks is only limited

So the help specifically says that if you provide the function with a number it will only be used as a suggestion.

One possible solution is to provide the break points yourself like so:

x <- rnorm(296)
hist(x, breaks=c(-4,-3,-2,-1,0,1,2,3,4,5))

If you don't want to do that but instead want to specify the number of bins you can use the cut function

plot(cut(x, 10))
like image 34
Karolis Koncevičius Avatar answered Oct 10 '22 23:10

Karolis Koncevičius