Python pandas apply function if a column value is not NULL

Tags:

I have a dataframe (in Python 2.7, pandas 0.15.0):

df=        A    B               C 0    NaN   11             NaN 1    two  NaN  ['foo', 'bar'] 2  three   33             NaN

I want to apply a simple function for rows that does not contain NULL values in a specific column. My function is as simple as possible:

def my_func(row):     print row

And my apply code is the following:

df[['A','B']].apply(lambda x: my_func(x) if(pd.notnull(x[0])) else x, axis = 1)

It works perfectly. If I want to check column 'B' for NULL values the pd.notnull() works perfectly as well. But if I select column 'C' that contains list objects:

df[['A','C']].apply(lambda x: my_func(x) if(pd.notnull(x[1])) else x, axis = 1)

then I get the following error message: ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', u'occurred at index 1')

Does anybody know why pd.notnull() works only for integer and string columns but not for 'list columns'?

And is there a nicer way to check for NULL values in column 'C' instead of this:

df[['A','C']].apply(lambda x: my_func(x) if(str(x[1]) != 'nan') else x, axis = 1)

Thank you!

993

asked Oct 28 '14 17:10

ragesz

2 Answers

The problem is that pd.notnull(['foo', 'bar']) operates elementwise and returns array([ True, True], dtype=bool). Your if condition trys to convert that to a boolean, and that's when you get the exception.

To fix it, you could simply wrap the isnull statement with np.all:

df[['A','C']].apply(lambda x: my_func(x) if(np.all(pd.notnull(x[1]))) else x, axis = 1)

Now you'll see that np.all(pd.notnull(['foo', 'bar'])) is indeed True.

answered Sep 23 '22 09:09

Korem

I had a column contained lists and NaNs. So, the next one worked for me.

df.C.map(lambda x: my_func(x) if type(x) == list else x)

answered Sep 23 '22 09:09

coffman21

Related questions
                            
                                How to find max value in a numpy array column?
                            
                                Python numpy.square vs **
                            
                                How big can the input to the input() function be?
                            
                                Django Rest Framework - Missing Static Directory
                            
                                What is the difference between numpy.linalg.lstsq and scipy.linalg.lstsq?
                            
                                Pandas "diff()" with string
                            
                                Shuffling non-zero elements of each row in an array - Python / NumPy
                            
                                Handle generator exceptions in its consumer
                            
                                Measure (max) memory usage with IPython—like timeit but memit
                            
                                How to refresh an already opened web page
                            
                                Get Confusion Matrix From a Keras Multiclass Model [duplicate]
                            
                                How to fix "<string> DeprecationWarning: invalid escape sequence" in Python?
                            
                                MySQL-db lib for Python 3.x?
                            
                                Why Python built in "all" function returns True for empty iterables?
                            
                                in Numpy, how to zip two 2-D arrays?
                            
                                VSCode's debugging mode always stop at first line
                            
                                Sklearn StratifiedKFold: ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead
                            
                                "lambda" vs. "operator.attrgetter('xxx')" as a sort key function
                            
                                How do I exclude a few columns from a DataFrame plot?
                            
                                How can I capture return value with Python timeit module?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python pandas apply function if a column value is not NULL

Tags:

python

list

null

pandas

apply

ragesz

People also ask

2 Answers

Korem

coffman21

Recent Activity

Donate For Us