Remove duplicate rows with certain value in specific column

Question

I have a data frame and I want to remove rows that are duplicated in all columns except one column and choose to keep the ones that are not certain values.

enter image description here

In above example, 3rd row and 4th row are duplicated for all columns except for col3, so I want to keep one row only. The complicated step is I want to keep 4th row instead of 3rd because 3rd row in col3 is "excluded". In general, I want to only keep the rows(that were duplicated) that do not have "excluded".

My real data frame have lots of duplicated rows and among those 2 rows that are duplicated, one of them is "excluded" for sure.

Below is re-producible ex:

a <- c(1,2,3,3,7)
b <- c(4,5,6,6,8)
c <- c("red","green","excluded","orange","excluded")
d <- data.frame(a,b,c)

Thank you so much!

Update: Or, when removing duplicate, only keep the second observation (4th row).

SKyJim · Accepted Answer

dplyr with some base R should work for this:

 library(dplyr) 
 a <- c(1,2,3,3,3,7)
 b <- c(4,5,6,6,6,8)
 c <- c("red","green","brown","excluded","orange","excluded")
 d <- data.frame(a,b,c)

 d <- filter(d, !duplicated(d[,1:2]) | c!="excluded")

Result: 
  a b        c
1 1 4      red
2 2 5    green
3 3 6    brown
4 3 6   orange
5 7 8 excluded

The filter will get rid of anything that should be excluded and not duplicated. I added an example of a none unique exclude to your example('brown') to test as well.

Remove duplicate rows with certain value in specific column

Tags:

dataframe

r

vicky

1 Answers

SKyJim

Recent Activity

Donate For Us

Remove duplicate rows with certain value in specific column

Tags:

dataframe

r

vicky

1 Answers

SKyJim

Related questions

Recent Activity

Donate For Us