I have a dataframe df. Code is written in such a manner
df.isnull().mean().sort_values(ascending = False)
Here is the some part of the output-
inq_fi 1.0
sec_app_fico_range_low 1.0
I want to understand how it is working?
if we use, df.isnull()
only it will return us True or False for each and every cell. How mean()
is going to give us right output. My objective is to find percentage of null values in all columns. Above output represents inq_fi and sec_app_fico_range_low has all missing values.
Also we are not passing by in sort_values?
Breakdown would look like this:
df.isnull()
#Mask all values that are NaN as True
df.isnull().mean()
#compute the mean of Boolean mask (True evaluates as 1 and False as 0)
df.isnull().mean().sort_values(ascending = False)
#sort the resulting series by column names descending
That being said a column that has values:
[np.nan, 2, 3, 4]
is evaluated as:
[True, False, False, False]
interpreted as:
[1, 0, 0, 0]
Resulting in:
0.25
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With