Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

df.unique() on whole DataFrame based on a column

I have a DataFrame df filled with rows and columns where there are duplicate Id's:

Index   Id   Type 0       a1   A 1       a2   A 2       b1   B 3       b3   B 4       a1   A ... 

When I use:

uniqueId = df["Id"].unique()  

I get a list of unique IDs.

How can I however apply this filtering on the whole DataFrame such that it keeps the structure but that the duplicates (based on "Id") are removed?

like image 812
JohnAndrews Avatar asked Apr 03 '17 12:04

JohnAndrews


People also ask

How do I print unique values from a column in pandas?

You can use the pandas unique() function to get the different unique values present in a column. It returns a numpy array of the unique values in the column.


1 Answers

It seems you need DataFrame.drop_duplicates with parameter subset which specify where are test duplicates:

#keep first duplicate value df = df.drop_duplicates(subset=['Id']) print (df)        Id Type Index          0      a1    A 1      a2    A 2      b1    B 3      b3    B 

#keep last duplicate value df = df.drop_duplicates(subset=['Id'], keep='last') print (df)        Id Type Index          1      a2    A 2      b1    B 3      b3    B 4      a1    A 

#remove all duplicate values df = df.drop_duplicates(subset=['Id'], keep=False) print (df)        Id Type Index          1      a2    A 2      b1    B 3      b3    B 
like image 176
jezrael Avatar answered Sep 28 '22 21:09

jezrael