Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop duplicates while preserving NaNs in pandas

Tags:

When using the drop_duplicates() method I reduce duplicates but also merge all NaNs into one entry. How can I drop duplicates while preserving rows with an empty entry (like np.nan, None or '')?

import pandas as pd
df = pd.DataFrame({'col':['one','two',np.nan,np.nan,np.nan,'two','two']})

Out[]: 
   col
0  one
1  two
2  NaN
3  NaN
4  NaN
5  two
6  two


df.drop_duplicates(['col'])

Out[]: 
   col
0  one
1  two
2  NaN
like image 494
bioslime Avatar asked May 07 '14 08:05

bioslime


People also ask

How do I drop complete duplicates in pandas?

Use DataFrame. drop_duplicates() to Drop Duplicate and Keep First Rows. You can use DataFrame. drop_duplicates() without any arguments to drop rows with the same values on all columns.

Does pandas drop duplicates keep first?

Drop duplicates but keep first drop_duplicates() . The rows that contain the same values in all the columns then are identified as duplicates. If the row is duplicated then by default DataFrame. drop_duplicates() keeps the first occurrence of that row and drops all other duplicates of it.


2 Answers

Try

df[(~df.duplicated()) | (df['col'].isnull())]

The result is :

col
0   one
1   two
2   NaN
3   NaN     
4   NaN
like image 119
user666 Avatar answered Sep 20 '22 04:09

user666


Well, one workaround that is not really beautiful is to first save the NaN and put them back in:

temp = df.iloc[pd.isnull(df).any(1).nonzero()[0]]
asd = df.drop_duplicates('col')
pd.merge(temp, asd, how='outer')
Out[81]: 
   col
0  one
1  two
2  NaN
3  NaN
4  NaN
like image 38
FooBar Avatar answered Sep 21 '22 04:09

FooBar