I'm trying to remove all rows that have a duplicate value. Hence, in the example I want to remove both rows that have a 2 and the three rows that have 6 under the x column. I have tried df[!duplicated(xy$x), ]
however this still gives me the first row that duplicates, where I do not want either row.
x <- c(1,2,2,4,5,6,6,6)
y <- c(1888,1999,2000,2001,2004,2005,2010,2011)
xy <- as.data.frame(cbind(x,y))
xy
x y
1 1 1888
2 2 1999
3 2 2000
4 4 2001
5 5 2004
6 6 2005
7 6 2010
8 6 2011
What I want is
x y
1 1888
4 2001
5 2004
Any help is appreciated. I need to avoid specifying the value to get rid of since I am dealing with a dataframe with thousands of records.
You can count and include only the singletons
xy[1==ave(xy$x,xy$x,FUN=length),]
x y 1 1 1888 4 4 2001 5 5 2004
we can do
xy[! xy$x %in% unique(xy[duplicated(xy$x), "x"]), ]
# x y
#1 1 1888
#4 4 2001
#5 5 2004
as
unique(xy[duplicated(xy$x), "x"])
gives the values of x
that are duplicated. Then we can just filter those out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With