I have two data frames with 2 columns in each. For example:
df.1 = data.frame(col.1 = c("a","a","a","a","b","b","b","c","c","d"), col.2 = c("b","c","d","e","c","d","e","d","e","e"))
df.2 = data.frame(col.1 = c("b","b","b","a","a","e"), col.2 = c("a","c","e","c","e","c"))
and I'm looking for an efficient way to look up the row index in df.2 of every col.1 col.2 row pair of df.1. Note that a row pair in df.1 may appear in df.2 in reverse order (for example df.1[1,], which is "a","b" appears in df.2[1,] as "b","a"). That doesn't matter to me. In other words, as long as a row pair in df.1 appears in any order in df.2 I want its row index in df.2, otherwise it should return NA. One more note, row pairs in both data frames are unique - meaning each row pair appears only once.
So for these two data frames the return vector would be:
c(1,4,NA,5,2,NA,3,NA,6,NA)
Maybe something using dplyr
package:
row_number()
to number as per row index efficiently. select
to "flip" the column vars.two halves:
df_ref_top <- df.2 %>% mutate(n=row_number())
df_ref_btm <- df.2 %>% select(col.1=col.2, col.2=col.1) %>% mutate(n=row_number())
then bind together:
df_ref <- rbind(df_ref_top,df_ref_btm)
gives to get your answer
left_join(df.1,df_ref)$n
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With