I've recently started using R and I don't think I'm understanding the hist()
function well. I'm currently working with a numeric vector of length 296, and I'd like to divide it up into 10 equal intervals, and produce a frequency histogram to see which values fall into each interval. I thought hist(dataset, breaks = 10)
would do the job, but it's dividing it into 12 intervals instead. I obviously misunderstood what breaks
does.
If I want to divide up my data into 10 intervals in my histogram, how should I go about doing that? Thank you.
The breaks argument controls the number of bars, cells or bins of the histogram. By default breaks = "Sturges" . Sturges method (default) The default method is the most recommended in the most of the cases.
Description. The generic function hist computes a histogram of the given data values. If plot = TRUE , the resulting object of class "histogram" is plotted by plot. histogram , before it is returned.
It shows the breaks, which are the cutoff points for the bins. It shows the counts, intensity/density for each bin (same thing but two different names for R version compatibility), the midpoints of each bin, and then the name of the variable, whether the bins are equidistant, and the class of the object.
As per the documentation, if you give the breaks
argument a single number, it is treated as a suggestion as it gives pretty breakpoints. If you want to force it to be 10 equally spaced bins, the easiest is probably the following,
x = rnorm(50)
hist(x, breaks = seq(min(x), max(x), length.out = 11))
The length should be n+1
where n
is the number of desired bins.
If you read help(hist)
you will find this explanation:
breaks: one of:
• a vector giving the breakpoints between histogram cells,
• a function to compute the vector of breakpoints,
• a single number giving the number of cells for the histogram,
• a character string naming an algorithm to compute the number of cells (see ‘Details’),
• a function to compute the number of cells.
In the last three cases the number is a suggestion only; as the breakpoints will be set to ‘pretty’ values, the number is limited to ‘1e6’ (with a warning if it was larger). If ‘breaks’ is a function, the ‘x’ vector is supplied to it as the only argument (and the number of breaks is only limited
So the help specifically says that if you provide the function with a number it will only be used as a suggestion.
One possible solution is to provide the break points yourself like so:
x <- rnorm(296)
hist(x, breaks=c(-4,-3,-2,-1,0,1,2,3,4,5))
If you don't want to do that but instead want to specify the number of bins you can use the cut
function
plot(cut(x, 10))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With