Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge dataframes by a match in at least one of two columns

Tags:

merge

r

I've been searching for a solution and have been experimenting, but I can't seem to perform what I should be a simple task.

I have two data frames formatted similar to the below toy examples

DF1 = data.frame(A=c("cats","dogs",NA,"dogs"), B=c("kittens","puppies","kittens",NA), C=c(88,99,101,110))

    A       B           C
1   cats    kittens     88
2   dogs    puppies     99
3   NA      kittens     101
4   dogs    NA          110


DF2 = data.frame(D=c(1,2), A=c("cats","dogs"), B=c("kittens","puppies"))

    D   A       B
1   1   cats    kittens
2   2   dogs    puppies

I wish to merge the two data sets such that the output is:

      A     B         C     D
1   cats    kittens   88    1
2   dogs    puppies   99    2
3   dogs    NA        110   2
4     NA    kittens   101   1

In other words, any rows with labels A=="cats" or B=="kittens" will be mapped to 1 in the column D, any rows with A=="dogs" or B=="puppies" will be mapped to 2.

I have used the command

merge(DF1, DF2, by=c("A","B"), all.x=TRUE)

However this not match rows 3 and 4 correctly, only rows 1 and 2. I get the output

      A     B         C     D
1   cats    kittens   88    1
2   dogs    puppies   99    2
3   dogs    NA        110   NA
4     NA    kittens   101   NA

Please note the actual datasets I'm working with are very long. In reality DF1 is over 1,000,000 rows and DF2 is over 300,000 rows thousands of rows each, so a solution that could be scaled is what I really need.

like image 513
Starcalibre Avatar asked Apr 30 '13 06:04

Starcalibre


1 Answers

Perhaps you can try something along these lines:

temp <- merge(DF1, DF2, by=c("A","B"), all.x=TRUE)

within(temp, {
  M1 <- c("cats", "kittens")
  D <- ifelse(A %in% M1 | B %in% M1, 1, 2)
  rm(M1)
})
#      A       B   C D
# 1 cats kittens  88 1
# 2 dogs puppies  99 2
# 3 dogs    <NA> 110 2
# 4 <NA> kittens 101 1

You can nest ifelse statements if you need more than just these two options.

like image 142
A5C1D2H2I1M1N2O1R2T1 Avatar answered Nov 15 '22 08:11

A5C1D2H2I1M1N2O1R2T1