How do I create binned factor variables from a continuous variable, with custom breaks?

Question

I have a vector that looks like this:

dataset <- c(4,7,9,1,10,15,18,19,3,16,10,16,12,22,2,23,16,17)

I would like to create four dummy categories, in which I bin the continuous dataset by custom breaks . .. for example: 1:4, 5:9, 10:17, 18:23.

The output dummy categories would have the same length as the original continuous vector (18 in this case), but now each binned dummy variable would just contain a 1 or a 0.

Joshua Ulrich · Accepted Answer

Use cut:

data.frame(dataset, bin=cut(dataset, c(1,4,9,17,23), include.lowest=TRUE))

IRTFM · Answer

I agree with Joshua that cut is what most people would think of for this task. I don't happen to like its defaults, preferring to have left-closed intervals and it's a minor pain to set that up correctly with cut (although it can be done. Fortunately for my feeble brain, Frank Harrell has designed a cut2 function in his Hmisc package whose defaults I prefer. A third alternative is to use findInterval which is especially suited for problems where you wnat to use the result as an index to another extractions or selection process. Its results are roughly what you would get if you applied as.numeric to the results of cut:

require(Hmisc)
cut2(dataset, c(1,4,9,17,23) )
 [1] [ 4, 9) [ 4, 9) [ 9,17) [ 1, 4) [ 9,17) [ 9,17) [17,23] [17,23] [ 1, 4) [ 9,17)
[11] [ 9,17) [ 9,17) [ 9,17) [17,23] [ 1, 4) [17,23] [ 9,17) [17,23]

(Notice that findInterval will use the upper bound as the closed end to form an extra interval unless you replace the maximum with Inf , a reserved word for infinity in R.)

findInterval(dataset, c( c(1,4,9,17,23) ) )
 [1] 2 2 3 1 3 3 4 4 1 3 3 3 3 4 1 5 3 4
as.numeric( cut(dataset, c(1,4,9,17,Inf), include.lowest=TRUE))
 [1] 1 2 2 1 3 3 4 4 1 3 3 3 3 4 1 4 3 3
as.numeric( cut(dataset, c(1,4,9,17,23), include.lowest=TRUE))
 [1] 1 2 2 1 3 3 4 4 1 3 3 3 3 4 1 4 3 3

How do I create binned factor variables from a continuous variable, with custom breaks?

Tags:

r

Luke

2 Answers

Joshua Ulrich

IRTFM

Recent Activity

Donate For Us

How do I create binned factor variables from a continuous variable, with custom breaks?

Tags:

r

Luke

2 Answers

Joshua Ulrich

IRTFM

Related questions

Recent Activity

Donate For Us