Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

duplicates in multiple columns

I have a data frame like so

> df   a  b c    d 1 1  2 A 1001 2 2  4 B 1002 3 3  6 B 1002 4 4  8 C 1003 5 5 10 D 1004 6 6 12 D 1004 7 7 13 E 1005 8 8 14 E 1006 

I want to remove the rows where there are repeated values in column c AND column d. So in this example rows 2,3,5 and 6 would removed.

I have used this, which works:

df[!(df$c %in% df$c[duplicated(df$c)] & df$d %in% df$d[duplicated(df$d)]),] >df   a  b c    d 1 1  2 A 1001 4 4  8 C 1003 7 7 13 E 1005 8 8 14 E 1006 

but it seems clunky and I can't help but think there is a better way. Any suggestions?

In case anyone wants to re-create the data-frame here is the dput:

df = structure(list(a = c(1, 2, 3, 4, 5, 6, 7, 8), b = c(2, 4, 6,  8, 10, 12, 13, 14), c = structure(c(1L, 2L, 2L, 3L, 4L, 4L, 5L,  5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"),      d = c(1001, 1002, 1002, 1003, 1004, 1004, 1005, 1006)), .Names = c("a",  "b", "c", "d"), row.names = c(NA, -8L), class = "data.frame") 
like image 740
Davy Kavanagh Avatar asked Dec 06 '12 11:12

Davy Kavanagh


1 Answers

It works if you use duplicated twice:

df[!(duplicated(df[c("c","d")]) | duplicated(df[c("c","d")], fromLast = TRUE)), ]    a  b c    d 1 1  2 A 1001 4 4  8 C 1003 7 7 13 E 1005 8 8 14 E 1006 
like image 111
Sven Hohenstein Avatar answered Sep 19 '22 09:09

Sven Hohenstein