I've got a dataset with a big number of rows. Some of the values are NaN, like this:
In [91]: df Out[91]: 1 3 1 1 1 1 3 1 1 1 2 3 1 1 1 1 1 NaN NaN NaN 1 3 1 1 1 1 1 1 1 1
And I want to count the number of NaN values in each string, it would be like this:
In [91]: list = <somecode with df> In [92]: list Out[91]: [0, 0, 0, 3, 0, 0]
What is the best and fastest way to do it?
Since sum() calculate as True=1 and False=0 , you can count the number of missing values in each row and column by calling sum() from the result of isnull() . You can count missing values in each column by default, and in each row with axis=1 .
The count property directly gives the count of non-NaN values in each column. So, we can get the count of NaN values, if we know the total number of observations.
df. isnull(). sum() will give the column-wise sum of missing values.
You could first find if element is NaN
or not by isnull()
and then take row-wise sum(axis=1)
In [195]: df.isnull().sum(axis=1) Out[195]: 0 0 1 0 2 0 3 3 4 0 5 0 dtype: int64
And, if you want the output as list, you can
In [196]: df.isnull().sum(axis=1).tolist() Out[196]: [0, 0, 0, 3, 0, 0]
Or use count
like
In [130]: df.shape[1] - df.count(axis=1) Out[130]: 0 0 1 0 2 0 3 3 4 0 5 0 dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With