Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count NaNs when unicode values present

Good morning all,

I have a pandas dataframe containing multiple series. For a given series within the dataframe, the datatypes are unicode, NaN, and int/float. I want to determine the number of NaNs in the series but cannot use the built in numpy.isnan method because it cannot safely cast unicode data into a format it can interpret. I have proposed a work around, but I'm wondering if there is a better/more Pythonic way of accomplishing this task.

Thanks in advance, Myles

import pandas as pd
import numpy as np

test = pd.Series(data = [NaN, 2, u'string'])
np.isnan(test).sum()
#Error

#Work around
test2 = [x for x in test if not(isinstance(x, unicode))]
numNaNs = np.isnan(test2).sum()
like image 998
Myles Baker Avatar asked Feb 26 '14 13:02

Myles Baker


People also ask

How do you count NaNs in a list?

If they're real floating-point NaNs, ordinary "count this thing" answers will be foiled by NaN != NaN . It turns out they are 'NaN' strings and the list. count('NaN') method works.

How do you count Isnull?

The isnull() function returns a dataset containing True and False values. Since, True is treated as a 1 and False as 0, calling the sum() method on the isnull() series returns the count of True values which actually corresponds to the number of NaN values.

Which attribute is used with series to count total number of NaN values?

Count total NaN at each column in DataFrame Calling sum() of the DataFrame returned by isnull() will give a series containing data about count of NaN in each column i.e.


1 Answers

Use pandas.isnull:

In [24]: test = pd.Series(data = [NaN, 2, u'string'])

In [25]: pd.isnull(test)
Out[25]: 
0     True
1    False
2    False
dtype: bool

Note however, that pd.isnull also regards None as True:

In [28]: pd.isnull([NaN, 2, u'string', None])
Out[28]: array([ True, False, False,  True], dtype=bool)
like image 154
unutbu Avatar answered Sep 20 '22 18:09

unutbu