Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R finding rows of a data frame where certain columns match those of another [duplicate]

I have an R question that I'm even sure how to word in one sentence, and couldn't find an answer for this yet.

I have two data frames that I would like to 'intersect' and find all rows where column values match in two columns. I've tried connecting two intersect() and which() statements with &&, but neither has given me what I want yet.

Here's what I mean. Let's say I have two data frames:

> testData
               Email     Manual Campaign Bounced Opened Clicked ClickThru Unsubscribed
1 [email protected] EIFLS0LS        1       0      0       0         0            0
2 [email protected] EIFLS0LS        1       0      0       0         0            0
3     [email protected] EIFLS0LS        1       0      0       0         0            0
4    [email protected] EIFLS0LS        1       0      0       0         0            0
5          [email protected] EIFLS0LS        1       0      0       0         0            0
6     [email protected] EIFLS0LS        1       0      0       0         0            0

> testBounced
               Email Campaign
1 [email protected]        1
2 [email protected]        2
3     [email protected]        2
4    [email protected]        1
5          [email protected]        1
6        [email protected]        1

As you can see, there are some values in the column Email that intersect, and some from the column Campaign that intersect. I want all of the rows from testData in which BOTH columns match.

ie:

               Email     Manual Campaign Bounced Opened Clicked ClickThru Unsubscribed
1 [email protected] EIFLS0LS        1       0      0       0         0            0
2    [email protected] EIFLS0LS        1       0      0       0         0            0
3          [email protected] EIFLS0LS        1       0      0       0         0            0

EDIT:

My goal in finding these columns is to be able to update a row in the original column. So the final output that I would like is:

> testData
               Email     Manual Campaign Bounced Opened Clicked ClickThru Unsubscribed
1 [email protected] EIFLS0LS        1       1      0       0         0            0
2 [email protected] EIFLS0LS        1       0      0       0         0            0
3     [email protected] EIFLS0LS        1       0      0       0         0            0
4    [email protected] EIFLS0LS        1       1      0       0         0            0
5          [email protected] EIFLS0LS        1       1      0       0         0            0
6     [email protected] EIFLS0LS        1       0      0       0         0            0

My apologies if this is a duplicate, and thanks in advance for your help!

EDIT2::

I ended up just using a for loop, nothing great, but doesn't feel efficient. The dataset was small enough to do it quickly, though. If anyone has a quick, R-style way to do it, I'd be happy to see it!

like image 902
so13eit Avatar asked Jul 26 '13 18:07

so13eit


People also ask

How do I find identical rows in R?

We can find the rows with duplicated values in a particular column of an R data frame by using duplicated function inside the subset function. This will return only the duplicate rows based on the column we choose that means the first unique value will not be in the output.

How do I find uncommon rows between two data frames in R?

Instead of finding the common rows, sometimes we need to find the uncommon rows between two data frames. It is mostly used when we expect that a large number of rows are uncommon instead of few ones. We can do this by using the negation operator which is represented by exclamation sign with subset function.

How do I compare two columns of data in R?

We can compare two columns in R by using ifelse(). This statement is used to check the condition given and return the data accordingly.


1 Answers

You want the function merge.

merge is commonly used to merge two tables by one similar common, but the by argument can allow multiple columns:

merge(testData, testBounced, by=c("Email", "Campaign"))

All pairs of Email and Campaign that don't match will be discarded by default. That's controllable by the arguments all.x and all.y, which default to FALSE.

The default argument for by is intersect(names(x, y)), so you technically don't need to specify the columns in this case, but it's good for clarity.

like image 160
Señor O Avatar answered Sep 23 '22 03:09

Señor O