Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest Way to Drop Duplicated Index in a Pandas DataFrame [duplicate]

If I want to drop duplicated index in a dataframe the following doesn't work for obvious reasons:

myDF.drop_duplicates(cols=index) 

and

myDF.drop_duplicates(cols='index')  

looks for a column named 'index'

If I want to drop an index I have to do:

myDF['index'] = myDF.index myDF= myDF.drop_duplicates(cols='index') myDF.set_index = myDF['index'] myDF= myDF.drop('index', axis =1) 

Is there a more efficient way?

like image 344
RukTech Avatar asked Apr 07 '14 16:04

RukTech


People also ask

How do I drop duplicate index values?

Example #1: Use Index. drop_duplicates() function to drop all the occurrences of the duplicate value except the first occurrence. Output : Let's drop all occurrences of duplicate value in the Index except the first occurrence.

How do I reset index after dropping duplicates?

Drop duplicates and reset the index But, if we need to reset the index of the resultant DataFrame, we can do that using the ignore_index parameter of DataFrame. drop_duplicate() . If ignore_index=True , it reset the row labels of resultant DataFrame to 0, 1, …, n – 1.


2 Answers

Simply: DF.groupby(DF.index).first()

like image 161
CT Zhu Avatar answered Sep 19 '22 16:09

CT Zhu


The 'duplicated' method works for dataframes and for series. Just select on those rows which aren't marked as having a duplicate index:

df[~df.index.duplicated()] 
like image 32
danielstn Avatar answered Sep 20 '22 16:09

danielstn