Good morning all,
I have a pandas
dataframe containing multiple series. For a given series within the dataframe, the datatypes are unicode, NaN, and int/float. I want to determine the number of NaNs in the series but cannot use the built in numpy.isnan
method because it cannot safely cast unicode data into a format it can interpret. I have proposed a work around, but I'm wondering if there is a better/more Pythonic way of accomplishing this task.
Thanks in advance, Myles
import pandas as pd
import numpy as np
test = pd.Series(data = [NaN, 2, u'string'])
np.isnan(test).sum()
#Error
#Work around
test2 = [x for x in test if not(isinstance(x, unicode))]
numNaNs = np.isnan(test2).sum()
If they're real floating-point NaNs, ordinary "count this thing" answers will be foiled by NaN != NaN . It turns out they are 'NaN' strings and the list. count('NaN') method works.
The isnull() function returns a dataset containing True and False values. Since, True is treated as a 1 and False as 0, calling the sum() method on the isnull() series returns the count of True values which actually corresponds to the number of NaN values.
Count total NaN at each column in DataFrame Calling sum() of the DataFrame returned by isnull() will give a series containing data about count of NaN in each column i.e.
Use pandas.isnull:
In [24]: test = pd.Series(data = [NaN, 2, u'string'])
In [25]: pd.isnull(test)
Out[25]:
0 True
1 False
2 False
dtype: bool
Note however, that pd.isnull
also regards None
as True
:
In [28]: pd.isnull([NaN, 2, u'string', None])
Out[28]: array([ True, False, False, True], dtype=bool)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With