Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unique on a dataframe with only selected columns

Tags:

r

unique

I have a dataframe with >100 columns, and I would to find the unique rows by comparing only two of the columns. I'm hoping this is an easy one, but I can't get it to work with unique or duplicated myself.

In the below, I would like to unique only using id and id2:

data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z"))  id id2 somevalue 1   1         x 1   1         y 3   4         z 

I would like to obtain either:

id id2 somevalue 1   1         x 3   4         z 

or:

id id2 somevalue 1   1         y 3   4         z 

(I have no preference which of the unique rows is kept)

like image 904
Ina Avatar asked Mar 30 '12 14:03

Ina


People also ask

How do I filter unique columns in pandas?

Pandas series aka columns has a unique() method that filters out only unique values from a column. The first output shows only unique FirstNames. We can extend this method using pandas concat() method and concat all the desired columns into 1 single column and then find the unique of the resultant column.

How do I get unique values in multiple columns in pandas?

To find unique values from multiple columns, use the unique() method. Let's say you have Employee Records with “EmpName” and “Zone” in your Pandas DataFrame.

How do I find unique values in a DataFrame column?

To get the unique values in multiple columns of a dataframe, we can merge the contents of those columns to create a single series object and then can call unique() function on that series object i.e. It returns the count of unique elements in multiple columns.


1 Answers

Ok, if it doesn't matter which value in the non-duplicated column you select, this should be pretty easy:

dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z")) > dat[!duplicated(dat[,c('id','id2')]),]   id id2 somevalue 1  1   1         x 3  3   4         z 

Inside the duplicated call, I'm simply passing only those columns from dat that I don't want duplicates of. This code will automatically always select the first of any ambiguous values. (In this case, x.)

like image 108
joran Avatar answered Sep 20 '22 12:09

joran