I'm using the cut function to split my data in equal bins, it does the job but I'm not happy with the way it returns the values. What I need is the center of the bin not the upper and lower ends.
I've also tried to use cut2{Hmisc}
, this gives me the center of each bins, but it divides the range of data in bins that contains the same numbers of observations, rather than being of the same length.
Does anyone have a solution to this?
The cut command removes the selected data from its original position, while the copy command creates a duplicate; in both cases the selected data is kept in temporary storage (the clipboard). The data from the clipboard is later inserted wherever a paste command is issued.
It's not too hard to make the breaks and labels yourself, with something like this. Here since the midpoint is a single number, I don't actually return a factor with labels but instead a numeric vector.
cut2 <- function(x, breaks) {
r <- range(x)
b <- seq(r[1], r[2], length=2*breaks+1)
brk <- b[0:breaks*2+1]
mid <- b[1:breaks*2]
brk[1] <- brk[1]-0.01
k <- cut(x, breaks=brk, labels=FALSE)
mid[k]
}
There's probably a better way to get the bin breaks and midpoints; I didn't think about it very hard.
Note that this answer is different than Joshua's; his gives the median of the data in each bins while this gives the center of each bin.
> head(cut2(x,3))
[1] 16.666667 3.333333 16.666667 3.333333 16.666667 16.666667
> head(ave(x, cut(x,3), FUN=median))
[1] 18 2 18 2 18 18
Use ave
like so:
set.seed(21)
x <- sample(0:20, 100, replace=TRUE)
xCenter <- ave(x, cut(x,3), FUN=median)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With