Drop Duplicates in a DataFrame Keeping the Row with the Least Nulls

Tags:

python

pandas

With this DataFrame:

d = {'A' : pd.Series(['AA', 'AA', 'AA', 'BB','CC'], 
           index=['a', 'b', 'c', 'd','e']),
     'B' : pd.Series([1., 2., 3.], index=['b', 'd','e']),
     'C' : pd.Series([4., 5., 6.], index=['b', 'd', '']),
     'D' : pd.Series([1., 2., 3.,4.], index=['a', 'c', 'd','e'])}

In[1]: pd.DataFrame(d)

Out[1]: 
     A    B    C    D
 a  AA  NaN  NaN  1.0
 b  AA  1.0  4.0  NaN
 c  AA  NaN  NaN  2.0
 d  BB  2.0  5.0  3.0
 e  CC  3.0  6.0  4.0

I would like to drop duplicates on df['A'] and keep the row with the fewest null values in the columns that are not being dropped on.

In[2]: pd.DataFrame(d).drop_duplicates(on='A', **magical_answer=True**)

Out[1]: 
     A    B    C    D
 b  AA  1.0  4.0  NaN
 d  BB  2.0  5.0  3.0
 e  CC  3.0  6.0  4.0

I can see a possible issue not enumerated in this example would occur if there are multiple rows with the fewest nulls, in that case it would be useful to have the keep : {‘first’, ‘last’} arg.

388

asked May 03 '17 20:05

it's-yer-boy-chet

2 Answers

An alternative would be to count the number of items in each row, sort the DataFrame and keep the last item so that it has the highest count.

(df.assign(counts=df.count(axis=1))
   .sort_values(['A', 'counts'])
   .drop_duplicates('A', keep='last')
   .drop('counts', axis=1))
Out: 
    A    B    C    D
b  AA  1.0  4.0  NaN
d  BB  2.0  5.0  3.0
e  CC  3.0  6.0  4.0

144

answered Oct 06 '22 23:10

ayhan

If you don't have duplicated index, you can do:

df.loc[df.notnull().sum(1).groupby(df.A).idxmax()]

#    A    B   C   D
#b  AA  1.0 4.0 NaN
#d  BB  2.0 5.0 3.0
#e  CC  3.0 6.0 4.0

answered Oct 07 '22 00:10

Psidom

Related questions
                            
                                how to add hour to pandas dataframe column
                            
                                How do I convert a complex number?
                            
                                what's the difference between "when='D' " and "when='midnight'" for TimedRotatingFileHandler?
                            
                                python install module apiclient
                            
                                use __name__ as attribute
                            
                                Is there a way to check whether a related object is already fetched?
                            
                                Pandas Dataframe datetime slicing with Index vs MultiIndex
                            
                                Tensorflow 'feed_dict': using same symbol for key-value pair got 'TypeError: Cannot interpret feed_dict key as Tensor'
                            
                                Google Calendar Integration with Django
                            
                                How to calculate conditional probability of values in dataframe pandas-python?
                            
                                instagram api keep raise 'You must provide a client_id' exception when I use python-instagram library
                            
                                How to count one specific word in Python?
                            
                                Select xarray/pandas index based on specific months
                            
                                Meaning of '\0\0' in Python?
                            
                                How to extract digits from a number from left to right?
                            
                                When to use one or two underscore in Python [duplicate]
                            
                                How to solve 'module' object has no attribute '_base' issue?
                            
                                ModuleNotFoundError: No module named 'bs4'
                            
                                How to check if a list contains a boolean value
                            
                                Python: convert matrix to positive semi-definite

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With