Sort rows of a dataframe in descending order of NaN counts

Tags:

I'm trying to sort the following Pandas DataFrame:

         RHS  age  height  shoe_size  weight
0     weight  NaN     0.0        0.0     1.0
1  shoe_size  NaN     0.0        1.0     NaN
2  shoe_size  3.0     0.0        0.0     NaN
3     weight  3.0     0.0        0.0     1.0
4        age  3.0     0.0        0.0     1.0

in such a way that the rows with a greater number of NaNs columns are positioned first. More precisely, in the above df, the row with index 1 (2 Nans) should come before ther row with index 0 (1 NaN).

What I do now is:

df.sort_values(by=['age', 'height', 'shoe_size', 'weight'], na_position="first")

612

asked Aug 27 '17 22:08

Juan Carlos

Video Answer

4 Answers

Using df.sort_values and loc based accessing.

df = df.iloc[df.isnull().sum(1).sort_values(ascending=0).index]
print(df)

         RHS  age  height  shoe_size  weight
1  shoe_size  NaN     0.0        1.0     NaN
2  shoe_size  3.0     0.0        0.0     NaN
0     weight  NaN     0.0        0.0     1.0
4        age  3.0     0.0        0.0     1.0
3     weight  3.0     0.0        0.0     1.0

df.isnull().sum(1) counts the NaNs and the rows are accessed based on this sorted count.

@ayhan offered a nice little improvement to the solution above, involving pd.Series.argsort:

df = df.iloc[df.isnull().sum(axis=1).mul(-1).argsort()]
print(df)

         RHS  age  height  shoe_size  weight 
1  shoe_size  NaN     0.0        1.0     NaN           
0     weight  NaN     0.0        0.0     1.0           
2  shoe_size  3.0     0.0        0.0     NaN           
3     weight  3.0     0.0        0.0     1.0           
4        age  3.0     0.0        0.0     1.0

159

answered Oct 11 '22 14:10

cs95

df.isnull().sum().sort_values(ascending=False)

answered Oct 11 '22 14:10

Zainab Ali

Here's a one-liner that will do it:

df.assign(Count_NA = lambda x: x.isnull().sum(axis=1)).sort_values('Count_NA', ascending=False).drop('Count_NA', axis=1)
#          RHS  age  height  shoe_size  weight
# 1  shoe_size  NaN     0.0        1.0     NaN
# 0     weight  NaN     0.0        0.0     1.0
# 2  shoe_size  3.0     0.0        0.0     NaN
# 3     weight  3.0     0.0        0.0     1.0
# 4        age  3.0     0.0        0.0     1.0

This works by assigning a temporary column ("Count_NA") to count the NAs in each row, sorting on that column, and then dropping it, all in the same expression.

answered Oct 11 '22 15:10

cmaher

You can add a column of the number of null values, sort by that column, then drop the column. It's up to you if you want to use .reset_index(drop=True) to reset the row count.

df['null_count'] = df.isnull().sum(axis=1)
df.sort_values('null_count', ascending=False).drop('null_count', axis=1)

# returns
         RHS  age  height  shoe_size  weight
1  shoe_size  NaN     0.0        1.0     NaN
0     weight  NaN     0.0        0.0     1.0
2  shoe_size  3.0     0.0        0.0     NaN
3     weight  3.0     0.0        0.0     1.0
4        age  3.0     0.0        0.0     1.0

answered Oct 11 '22 13:10

James

Related questions
                            
                                How can I normalize colormap in matplotlib scatter plot?
                            
                                Cumulative (running) sum with django orm and postgresql
                            
                                python delete dict keys in list comprehension
                            
                                Python Pandas: Create new column out of other columns where value is not null
                            
                                Psycopg2: 'module' object has no attribute 'connect' [duplicate]
                            
                                Play an Animated GIF in python with tkinter
                            
                                ModuleNotFoundError: No module named 'selenium'
                            
                                Pandas-Add missing years in time series data with duplicate years
                            
                                Permutation without duplicates in Python
                            
                                Color points in scatter plot of Bokeh
                            
                                PySpark Dataframe : comma to dot
                            
                                Error creating virtualenv with Python3
                            
                                Python3: unable to import JSONDecodeError from json.decoder
                            
                                Why couldn't Julia superset python?
                            
                                How to insert logo in the center of qrcode in Python?
                            
                                Handling division by zero in Pandas calculations
                            
                                reduce line width of seaborn timeseries plot
                            
                                Matplotlib: how to add xlabel, title to each subplot
                            
                                plt.show() not working in spyder ide
                            
                                Change default faker locale in factory_boy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sort rows of a dataframe in descending order of NaN counts

Tags:

python

sorting

pandas

dataframe

nan