Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

remove duplicate rows based on conditions from multiple columns in r

I have a data set I would like to remove the rows of data that have duplicate information in 4 different columns.

foo<- data.frame(g1 = c("1","0","0","1","1"), v1 = c("7","5","4","4","3"), v2 = c("a","b","x","x","e"), y1 = c("y","c","f","f","w"), y2= c("y","y","y","f","c"), y3 = c("y","c","c","f","w"), y4= c("y","y","f","f","c"), y5=c("y","w","f","f","w"), y6=c("y","c","f","f","w"))

foo then looks like:

  g1 v1 v2 y1 y2 y3 y4 y5 y6
1  1  7  a  y  y  y  y  y  y
2  0  5  b  c  y  c  y  w  c
3  0  4  x  f  y  c  f  f  f
4  1  4  x  f  f  f  f  f  f
5  1  3  e  w  c  w  c  w  w

Now, I want to remove any row that has duplicated data based on the Y1-6columns. So, only row 4 and 1 would be removed if done properly, based on all Y variables being the exact same. Its a multiple column condition.

I believe I am close, but its just not working correctly.

I have tried: new = foo[!(duplicated(foo[,1:6]))] thinking to use the duplicated command that it would search and only find those that matched exactly?

I thought about using a conditional statement with &, but can't figure out how to do that either.
new = foo[foo$y1==foo$y2|foo$y3|foo$y4|foo$y5|foo$y6]

I thought about which but Im now overwhelmed and lost. I would expect foo to look like:

   g1 v1 v2 y1 y2 y3 y4 y5 y6
2  0  5  b  c  y  c  y  w  c
3  0  4  x  f  y  c  f  f  f
5  1  3  e  w  c  w  c  w  w
like image 281
Kerry Avatar asked Sep 14 '12 13:09

Kerry


People also ask

How do I remove duplicate rows from multiple columns in R?

distinct() function can be used to filter out the duplicate rows. We just have to pass our R object and the column name as an argument in the distinct() function.

Can you remove duplicates based on two columns?

Often you may want to remove duplicate rows based on two columns in Excel. Fortunately this is easy to do using the Remove Duplicates function within the Data tab.

How do I remove duplicates based on condition?

To remove duplicate values, click Data > Data Tools > Remove Duplicates. To highlight unique or duplicate values, use the Conditional Formatting command in the Style group on the Home tab.


2 Answers

> foo[apply(foo[ , paste("y", 1:6, sep = "")], 1,
            FUN = function(x) length(unique(x)) > 1 ), ]
  g1 v1 v2 y1 y2 y3 y4 y5 y6
2  0  5  b  c  y  c  y  w  c
3  0  4  x  f  y  c  f  f  f
5  1  3  e  w  c  w  c  w  w
like image 66
Sven Hohenstein Avatar answered Sep 20 '22 00:09

Sven Hohenstein


foo[apply(foo, 1, function(x) any(x != x[1])),]
like image 42
Backlin Avatar answered Sep 22 '22 00:09

Backlin