I want to get rid of the duplicates by using correct information in the another data frame.
The problem is original data has the duplicates both with the right values and wrong values. The right values are defined in another data frame, so I want to use that data frame as a reference for those rows.
So the job I want to do conditional for two rows. To illustrate it, lets say the original data is tree1
:
tree1 = data.frame(
sp = c("oak","pine","apple","birch","oak","pine","apple","maple"),
code = c(23:26,77,88,99,27))
> tree1
sp code
1 oak 23
2 pine 24
3 apple 25
4 birch 26
5 oak 77
6 pine 88
7 apple 99
8 maple 27
And the reference data is tree2
:
tree2 = data.frame( sp = c("oak","pine","apple"),
code = 23:25)
> tree2
sp code
1 oak 23
2 pine 24
3 apple 25
And my desired output that I get rid of the duplicates with wrong values where I still have the original data should seem like below:
> tree3
sp code
1 oak 23
2 pine 24
3 apple 25
4 birch 26
5 maple 27
I know that it seems like an easy conditional operation but I ended up deleting some original values or keeping the duplicates with wrong values in the end (other way around is not working). Simple R-base help would be great.
One option using base R mapply
. Assuming you have same columns in tree1
and tree2
and in same order we can check values in tree1
which are present in tree2
and select only those rows where all the values match or no values match.
vals <- rowSums(mapply(`%in%`, tree1, tree2))
tree1[vals == ncol(tree1) | vals == 0, ]
# sp code
#1 oak 23
#2 pine 24
#3 apple 25
#4 birch 26
#8 maple 27
Here is a dplyr
option:
library(dplyr)
tree2bis <- filter(tree1, !(tree1$sp %in% tree2$sp)) # dataframe with no duplicated rows
tree1 %>% inner_join(tree2) %>% bind_rows(tree2bis)
# output
sp code
1 oak 23
2 pine 24
3 apple 25
4 birch 26
5 maple 27
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With