Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicated rows (based on 2 columns) in R

I have a dataset in R which looks like this:

    x1 x2  x3
1:  A Away  2
2:  A Home  2
3:  B Away  2
4:  B Away  1
5:  B Home  2
6:  B Home  1
7:  C Away  1
8:  C Home  1

Based on the values in columns x1 and x2, I want to remove the duplicate rows. I have tried the following:

df[!duplicated(df[,c('x1', 'x2')]),]

It should remove rows 4 and 6. But unfortunately it is not working, as it returns exactly the same data, with the duplicates still present in the dataset. What do I have to use in order to remove rows 4 and 6?

like image 514
sander Avatar asked Jul 28 '16 13:07

sander


People also ask

How do I remove repeated rows in R?

Remove Duplicate rows in R using Dplyr – distinct () function. Distinct function in R is used to remove duplicate rows in R using Dplyr package. Dplyr package in R is provided with distinct() function which eliminate duplicates rows with single variable or with multiple variable.

How do I get the same value in two columns in R?

To find the common elements between two columns of an R data frame, we can use intersect function.


1 Answers

I'd just do:

unique(df, by=c("x1", "x2")) # where df is a data.table

This'd have been quite obvious if you'd just looked at ?unique.

PS: given the syntax in your Q, I wonder if you are aware of the basic differences between data.table and data.frame's syntax. I suggest you read the vignettes first.

like image 185
Arun Avatar answered Sep 21 '22 23:09

Arun