Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

understand df.isnull.mean() in python

I have a dataframe df. Code is written in such a manner

df.isnull().mean().sort_values(ascending = False)

Here is the some part of the output-

inq_fi                                 1.0
sec_app_fico_range_low                 1.0

I want to understand how it is working?

if we use, df.isnull() only it will return us True or False for each and every cell. How mean() is going to give us right output. My objective is to find percentage of null values in all columns. Above output represents inq_fi and sec_app_fico_range_low has all missing values.

Also we are not passing by in sort_values?

like image 771
yashul Avatar asked Jan 28 '23 04:01

yashul


1 Answers

Breakdown would look like this:

df.isnull()
#Mask all values that are NaN as True
df.isnull().mean()
#compute the mean of Boolean mask (True evaluates as 1 and False as 0)
df.isnull().mean().sort_values(ascending = False)
#sort the resulting series by column names descending

That being said a column that has values:

[np.nan, 2, 3, 4]

is evaluated as:

[True, False, False, False]

interpreted as:

[1, 0, 0, 0]

Resulting in:

0.25
like image 138
zipa Avatar answered Jan 29 '23 17:01

zipa