When using the drop_duplicates()
method I reduce duplicates but also merge all NaNs
into one entry. How can I drop duplicates while preserving rows with an empty entry (like np.nan, None or ''
)?
import pandas as pd
df = pd.DataFrame({'col':['one','two',np.nan,np.nan,np.nan,'two','two']})
Out[]:
col
0 one
1 two
2 NaN
3 NaN
4 NaN
5 two
6 two
df.drop_duplicates(['col'])
Out[]:
col
0 one
1 two
2 NaN
Use DataFrame. drop_duplicates() to Drop Duplicate and Keep First Rows. You can use DataFrame. drop_duplicates() without any arguments to drop rows with the same values on all columns.
Drop duplicates but keep first drop_duplicates() . The rows that contain the same values in all the columns then are identified as duplicates. If the row is duplicated then by default DataFrame. drop_duplicates() keeps the first occurrence of that row and drops all other duplicates of it.
Try
df[(~df.duplicated()) | (df['col'].isnull())]
The result is :
col
0 one
1 two
2 NaN
3 NaN
4 NaN
Well, one workaround that is not really beautiful is to first save the NaN
and put them back in:
temp = df.iloc[pd.isnull(df).any(1).nonzero()[0]]
asd = df.drop_duplicates('col')
pd.merge(temp, asd, how='outer')
Out[81]:
col
0 one
1 two
2 NaN
3 NaN
4 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With