Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

divide a range of values in bins of equal length: cut vs cut2

Tags:

r

hmisc

I'm using the cut function to split my data in equal bins, it does the job but I'm not happy with the way it returns the values. What I need is the center of the bin not the upper and lower ends.
I've also tried to use cut2{Hmisc}, this gives me the center of each bins, but it divides the range of data in bins that contains the same numbers of observations, rather than being of the same length.

Does anyone have a solution to this?

like image 759
matteo Avatar asked May 06 '11 19:05

matteo


People also ask

What is cut function?

The cut command removes the selected data from its original position, while the copy command creates a duplicate; in both cases the selected data is kept in temporary storage (the clipboard). The data from the clipboard is later inserted wherever a paste command is issued.


2 Answers

It's not too hard to make the breaks and labels yourself, with something like this. Here since the midpoint is a single number, I don't actually return a factor with labels but instead a numeric vector.

cut2 <- function(x, breaks) {
  r <- range(x)
  b <- seq(r[1], r[2], length=2*breaks+1)
  brk <- b[0:breaks*2+1]
  mid <- b[1:breaks*2]
  brk[1] <- brk[1]-0.01
  k <- cut(x, breaks=brk, labels=FALSE)
  mid[k]
}

There's probably a better way to get the bin breaks and midpoints; I didn't think about it very hard.

Note that this answer is different than Joshua's; his gives the median of the data in each bins while this gives the center of each bin.

> head(cut2(x,3))
[1] 16.666667  3.333333 16.666667  3.333333 16.666667 16.666667
> head(ave(x, cut(x,3), FUN=median))
[1] 18  2 18  2 18 18
like image 173
Aaron left Stack Overflow Avatar answered Sep 19 '22 01:09

Aaron left Stack Overflow


Use ave like so:

set.seed(21)
x <- sample(0:20, 100, replace=TRUE)
xCenter <- ave(x, cut(x,3), FUN=median)
like image 44
Joshua Ulrich Avatar answered Sep 19 '22 01:09

Joshua Ulrich