Cleaning the duplicates with a reference from another data frame

Question

I want to get rid of the duplicates by using correct information in the another data frame.

The problem is original data has the duplicates both with the right values and wrong values. The right values are defined in another data frame, so I want to use that data frame as a reference for those rows.

So the job I want to do conditional for two rows. To illustrate it, lets say the original data is tree1 :

tree1 = data.frame( 
sp = c("oak","pine","apple","birch","oak","pine","apple","maple"), 
code = c(23:26,77,88,99,27))
> tree1
     sp code
1   oak   23
2  pine   24
3 apple   25
4 birch   26
5   oak   77
6  pine   88
7 apple   99
8 maple   27

And the reference data is tree2:

tree2 = data.frame( sp = c("oak","pine","apple"),
                    code = 23:25)
> tree2
     sp code
1   oak   23
2  pine   24
3 apple   25

And my desired output that I get rid of the duplicates with wrong values where I still have the original data should seem like below:

> tree3
     sp code
1   oak   23
2  pine   24
3 apple   25
4 birch   26
5 maple   27

I know that it seems like an easy conditional operation but I ended up deleting some original values or keeping the duplicates with wrong values in the end (other way around is not working). Simple R-base help would be great.

Ronak Shah · Accepted Answer

One option using base R mapply. Assuming you have same columns in tree1 and tree2 and in same order we can check values in tree1 which are present in tree2 and select only those rows where all the values match or no values match.

vals <- rowSums(mapply(`%in%`, tree1, tree2))
tree1[vals == ncol(tree1) | vals == 0, ]

#    sp  code
#1   oak   23
#2  pine   24
#3 apple   25
#4 birch   26
#8 maple   27

nghauran · Answer

Here is a dplyr option:

library(dplyr)
tree2bis <- filter(tree1, !(tree1$sp %in% tree2$sp)) # dataframe with no duplicated rows
tree1 %>% inner_join(tree2) %>% bind_rows(tree2bis)
# output
     sp code
1   oak   23
2  pine   24
3 apple   25
4 birch   26
5 maple   27

Cleaning the duplicates with a reference from another data frame

Tags:

reference

dataframe

r

duplicates

DSA

2 Answers

Ronak Shah

nghauran

Recent Activity

Donate For Us

Cleaning the duplicates with a reference from another data frame

Tags:

reference

dataframe

r

duplicates

DSA

2 Answers

Ronak Shah

nghauran

Related questions

Recent Activity

Donate For Us