Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R, how do you classify values in one data frame based on ranges in another data frame?

In general, how could I classify values in one column of a data frame with respect to factor values in another data frame? For example, given df1 and df2 I would like to generate df3 (or update df1):

> df1
  NewAge
1      5
2     25
3     18
4      9
5     43
6     15
7     17

> df2
  AgeStart AgeEnd AgeType
1        0     10       A
2       10     20       B
3       20     30       A
4       30     40       B
5       40     50       A

I want df3 as:

NewAge Type
  5      A   
 25      A
 18      B
  9      A
 43      A
 15      B
 17      B

I used cut() to generate intervals

df2_cut <- data.frame(NewAge, 
                      "AgeRange" = cut(NewAge,
                                       breaks=AgeStart, 
                                       right=F, 
                                       include.lowest=T))
> df2_cut
  NewAge AgeRange
1      5   [0,10)
2     25  [20,30)
3     18  [10,20)
4      9   [0,10)
5     43  [40,50]
6     15  [10,20)
7     17  [10,20)

but I don't know how to classify df2_cut values according to the interval type (i.e. A or B).

like image 601
val Avatar asked Dec 23 '15 02:12

val


1 Answers

We can use findInterval. The output will be a numeric index which we use to get the corresponding elements from 'AgeType'.

df3 <- transform(df1, Type=df2$AgeType[findInterval(NewAge, df2$AgeStart)])
df3
#  NewAge Type
#1      5    A
#2     25    A
#3     18    B
#4      9    A
#5     43    A
#6     15    B
#7     17    B

Or with labels=FALSE in cut

like image 112
akrun Avatar answered Oct 12 '22 14:10

akrun