Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Intersecting multiple columns between two data frames

Tags:

dataframe

r

I have two data frames with 2 columns in each. For example:

df.1 = data.frame(col.1 = c("a","a","a","a","b","b","b","c","c","d"), col.2 = c("b","c","d","e","c","d","e","d","e","e"))
df.2 = data.frame(col.1 = c("b","b","b","a","a","e"), col.2 = c("a","c","e","c","e","c"))

and I'm looking for an efficient way to look up the row index in df.2 of every col.1 col.2 row pair of df.1. Note that a row pair in df.1 may appear in df.2 in reverse order (for example df.1[1,], which is "a","b" appears in df.2[1,] as "b","a"). That doesn't matter to me. In other words, as long as a row pair in df.1 appears in any order in df.2 I want its row index in df.2, otherwise it should return NA. One more note, row pairs in both data frames are unique - meaning each row pair appears only once.

So for these two data frames the return vector would be:

c(1,4,NA,5,2,NA,3,NA,6,NA)
like image 262
user1701545 Avatar asked Dec 19 '22 14:12

user1701545


1 Answers

Maybe something using dplyr package:

first make the reference frame

  • use row_number() to number as per row index efficiently.
  • use select to "flip" the column vars.

two halves:

df_ref_top <- df.2 %>% mutate(n=row_number())
df_ref_btm <- df.2 %>% select(col.1=col.2, col.2=col.1) %>% mutate(n=row_number())

then bind together:

df_ref <- rbind(df_ref_top,df_ref_btm)

Left join and select vector:

gives to get your answer

left_join(df.1,df_ref)$n
like image 70
npjc Avatar answered Jan 09 '23 08:01

npjc