When using the <code>drop_duplicates()</code> method I reduce duplicates but also merge all <code>NaNs</code> into one entry. How can I drop duplicates while preserving rows with an empty entry (like <code>np.nan, None or ''</code>)? <pre class="prettyprint"><code>import pandas as pd df = pd.DataFrame({'col':['one','two',np.nan,np.nan,np.nan,'two','two']}) Out[]: col 0 one 1 two 2 NaN 3 NaN 4 NaN 5 two 6 two df.drop_duplicates(['col']) Out[]: col 0 one 1 two 2 NaN </code></pre>

Try <pre class="prettyprint"><code>df[(~df.duplicated()) | (df['col'].isnull())] </code></pre> The result is : <pre class="prettyprint"><code>col 0 one 1 two 2 NaN 3 NaN 4 NaN </code></pre>

Well, one workaround that is not really beautiful is to first save the <code>NaN</code> and put them back in: <pre class="prettyprint"><code>temp = df.iloc[pd.isnull(df).any(1).nonzero()[0]] asd = df.drop_duplicates('col') pd.merge(temp, asd, how='outer') Out[81]: col 0 one 1 two 2 NaN 3 NaN 4 NaN </code></pre>

Drop duplicates while preserving NaNs in pandas

Tags:

When using the drop_duplicates() method I reduce duplicates but also merge all NaNs into one entry. How can I drop duplicates while preserving rows with an empty entry (like np.nan, None or '')?

import pandas as pd
df = pd.DataFrame({'col':['one','two',np.nan,np.nan,np.nan,'two','two']})

Out[]: 
   col
0  one
1  two
2  NaN
3  NaN
4  NaN
5  two
6  two


df.drop_duplicates(['col'])

Out[]: 
   col
0  one
1  two
2  NaN

494

asked May 07 '14 08:05

bioslime

2 Answers

Try

df[(~df.duplicated()) | (df['col'].isnull())]

The result is :

col
0   one
1   two
2   NaN
3   NaN     
4   NaN

119

answered Sep 20 '22 04:09

user666

Well, one workaround that is not really beautiful is to first save the NaN and put them back in:

temp = df.iloc[pd.isnull(df).any(1).nonzero()[0]]
asd = df.drop_duplicates('col')
pd.merge(temp, asd, how='outer')
Out[81]: 
   col
0  one
1  two
2  NaN
3  NaN
4  NaN

answered Sep 21 '22 04:09

FooBar

Related questions
                            
                                D3 force layout - linking nodes by name instead of index
                            
                                Automatic rescaling of an application on high-dpi Windows platform?
                            
                                Parameters required by bar3d with python
                            
                                UIButton AddTarget multiple times on same target action only calls once?
                            
                                What's the difference between python's multiprocessing and concurrent.futures?
                            
                                Using getFragmentManager() vs getSupportFragmentManager()?
                            
                                Installation of cider-nrepl
                            
                                Why is a nested struct inside a generic class considered "managed"?
                            
                                What is the difference between a cyclic list and an infinite list in haskell?
                            
                                Google Drive PHP API - Simple File Upload
                            
                                How to avoid "RuntimeWarning: invalid value encountered in divide" in NumPy?
                            
                                How to implement real time data for a web page

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With