Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

lookup keys in df2 in an array of keys in df1 and merge corresponding values

I have df1

df1 <- data.frame(states = c("wash", "mont", "oreg", "cali", "michi"), key1 = c(1,3,5,7,9), key2 = c(2,4,6,8,10))

which looks like this (key1 and key2 are an arrays of keys):

  states key1 key2
1   wash    1    2
2   mont    3    4
3   oreg    5    6
4   cali    7    8
5  michi    9   10

df2 has additional info

df2 <- data.frame(sample = c(9,8,5,4,1), value = c("steel", "gold", "blue", "grey", "green"))

which looks like this:

  sample value
1      9 steel
2      8  gold
3      5  blue
4      4  grey
5      1 green

samples in df2 need to be matched to EITHER key1 or key2 in df1 to make df3

  states key1 key2 sample value
1   wash    1    2      1 green
2   mont    3    4      4  grey
3   oreg    5    6      5  blue
4   cali    7    8      8  gold
5  michi    9   10      9 steel

I can then just remove the sample column...not a problem. How do I extend df2 to df3 if the value for sample can be in either key1 or key2?

Thanks!

like image 864
willnotburn Avatar asked Mar 21 '23 04:03

willnotburn


2 Answers

You can use a couple merge calls and bind their outputs:

rbind(transform(merge(df1, df2, by.x = "key1", by.y = "sample"), sample = key1),
      transform(merge(df1, df2, by.x = "key2", by.y = "sample"), sample = key2))
#   key1 states key2 value sample
# 1    1   wash    2 green      1
# 2    5   oreg    6  blue      5
# 3    9  michi   10 steel      9
# 4    3   mont    4  grey      4
# 5    7   cali    8  gold      8

Another approach:

match.idx <- pmax(match(df1$key1, df2$sample),
                  match(df1$key2, df2$sample), na.rm = TRUE)
cbind(df1, df2[match.idx, ])
#   states key1 key2 sample value
# 5   wash    1    2      1 green
# 4   mont    3    4      4  grey
# 3   oreg    5    6      5  blue
# 2   cali    7    8      8  gold
# 1  michi    9   10      9 steel
like image 70
flodel Avatar answered Apr 25 '23 17:04

flodel


An approach is to create a new key column by matching key1 with df2$sample and key2 with df2$sample, then you can join directly. I'll use data.table to illustrate this.

require(data.table) ## >= 1.9.0
setDT(df1)          ## convert data.frame to data.table by reference
setDT(df2)          ## idem

# get the key as a common column
df1[(key1 %in% df2$sample), the_key := key1]
df1[(key2 %in% df2$sample), the_key := key2]

Here := assigns a new column by reference once again (no copy is being made). Now what remains is just to setkeyand join.

# setkey and join
setkey(df1, the_key)
setkey(df2, sample)
df1[df2]

#    states key1 key2 the_key value
# 1:   wash    1    2       1 green
# 2:   mont    3    4       4  grey
# 3:   oreg    5    6       5  blue
# 4:   cali    7    8       8  gold
# 5:  michi    9   10       9 steel
like image 31
Arun Avatar answered Apr 25 '23 18:04

Arun