Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset with unique cases, based on multiple columns

Tags:

r

unique

subset

I'd like to subset a dataframe to include only rows that have unique combinations of three columns. My situation is similar to the one presented in this question, but I'd like to preserve the other columns in my data as well. Here's my example:

> df   v1  v2  v3   v4  v5 1  7   1   A  100  98  2  7   2   A   98  97 3  8   1   C   NA  80 4  8   1   C   78  75 5  8   1   C   50  62 6  9   3   C   75  75 

The requested output would be something like this, where I'm looking for unique cases based on v1, v2, and v3 only:

> df.new   v1  v2  v3   v4  v5 1  7   1   A  100  98  2  7   2   A   98  97 3  8   1   C   NA  80 6  9   3   C   75  75 

If I could recover the non-unique rows that would be great too:

> df.dupes   v1  v2  v3   v4  v5 3  8   1   C   NA  80 4  8   1   C   78  75 5  8   1   C   50  62 

I saw a related question for how to do this in sql (here), but I can't get this in R. I'm sure it's simple but messing with unique() and subset() hasn't been fruitful. Thanks in advance.

like image 904
bosbmgatl Avatar asked Jul 06 '12 21:07

bosbmgatl


People also ask

How do I get unique columns in R?

To find unique values in a column in a data frame, use the unique() function in R. In Exploratory Data Analysis, the unique() function is crucial since it detects and eliminates duplicate values in the data.


1 Answers

You can use the duplicated() function to find the unique combinations:

> df[!duplicated(df[1:3]),]   v1 v2 v3  v4 v5 1  7  1  A 100 98 2  7  2  A  98 97 3  8  1  C  NA 80 6  9  3  C  75 75 

To get only the duplicates, you can check it in both directions:

> df[duplicated(df[1:3]) | duplicated(df[1:3], fromLast=TRUE),]   v1 v2 v3 v4 v5 3  8  1  C NA 80 4  8  1  C 78 75 5  8  1  C 50 62 
like image 166
Ken Williams Avatar answered Sep 18 '22 16:09

Ken Williams