I have a dataframe (in Python 2.7, pandas 0.15.0):
df= A B C 0 NaN 11 NaN 1 two NaN ['foo', 'bar'] 2 three 33 NaN
I want to apply a simple function for rows that does not contain NULL values in a specific column. My function is as simple as possible:
def my_func(row): print row
And my apply code is the following:
df[['A','B']].apply(lambda x: my_func(x) if(pd.notnull(x[0])) else x, axis = 1)
It works perfectly. If I want to check column 'B' for NULL values the pd.notnull()
works perfectly as well. But if I select column 'C' that contains list objects:
df[['A','C']].apply(lambda x: my_func(x) if(pd.notnull(x[1])) else x, axis = 1)
then I get the following error message: ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', u'occurred at index 1')
Does anybody know why pd.notnull()
works only for integer and string columns but not for 'list columns'?
And is there a nicer way to check for NULL values in column 'C' instead of this:
df[['A','C']].apply(lambda x: my_func(x) if(str(x[1]) != 'nan') else x, axis = 1)
Thank you!
Python Pandas – Check for Null values using notnull() Now, on displaying the DataFrame, the CSV data will be displayed in the form of True and False i.e. boolean values because notnull() returns boolean. For Null values, False will get displayed. For Not-Null values, True will get displayed.
notnull is a pandas function that will examine one or multiple values to validate that they are not null. In Python, null values are reflected as NaN (not a number) or None to signify no data present. . notnull will return False if either NaN or None is detected. If these values are not present, it will return True.
You can filter out rows with NAN value from pandas DataFrame column string, float, datetime e.t.c by using DataFrame. dropna() and DataFrame. notnull() methods. Python doesn't support Null hence any missing data is represented as None or NaN.
The problem is that pd.notnull(['foo', 'bar'])
operates elementwise and returns array([ True, True], dtype=bool)
. Your if condition trys to convert that to a boolean, and that's when you get the exception.
To fix it, you could simply wrap the isnull statement with np.all
:
df[['A','C']].apply(lambda x: my_func(x) if(np.all(pd.notnull(x[1]))) else x, axis = 1)
Now you'll see that np.all(pd.notnull(['foo', 'bar']))
is indeed True
.
I had a column contained lists and NaN
s. So, the next one worked for me.
df.C.map(lambda x: my_func(x) if type(x) == list else x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With