Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is pandas.apply() executing on null elements?

Tags:

python

pandas

Supposedly, the pandas.apply() function does not apply to null elements. However, this is not occuring in the following code. Why is this happening?

import pandas as pd
df = pd.Series([[1,2],[2,3,4,5],None])
df
0          [1, 2]
1    [2, 3, 4, 5]
2            None
dtype: object
df.apply(lambda x: len(x))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Alexander\Anaconda3\lib\site-packages\pandas\core\series.py", l
ine 2169, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas\src\inference.pyx", line 1059, in pandas.lib.map_infer (pandas\li
b.c:62578)
  File "<stdin>", line 1, in <lambda>
TypeError: object of type 'NoneType' has no len()
like image 391
Alex Avatar asked Jan 03 '16 08:01

Alex


People also ask

What is the purpose of the Pandas apply function?

The Pandas apply() function lets you to manipulate columns and rows in a DataFrame.

Is apply faster than a for loop Pandas?

apply is not faster in itself but it has advantages when used in combination with DataFrames. This depends on the content of the apply expression. If it can be executed in Cython space, apply is much faster (which is the case here). We can use apply with a Lambda function.


1 Answers

None and nan are semantically equivalent. There is no point in replacing None with numpy.nan. apply will still apply the function to NaN elements.

df[2] = numpy.nan
df.apply(lambda x: print(x))

Output: [1, 2]
        [2, 3, 4, 5]
        nan

You have to check for a missing value in your function you want to apply or use pandas.dropna and apply the function to the result:

df.dropna().apply(lambda x: print(x))

Alternatively, use pandas.notnull() which returns a series of booleans:

df[df.notnull()].apply(lambda x: print(x))

Please also read: http://pandas.pydata.org/pandas-docs/stable/missing_data.html

And specifically, this:

Warning:

One has to be mindful that in python (and numpy), the nan's don’t compare equal, but None's do. Note that Pandas/numpy uses the fact that np.nan != np.nan, and treats None like np.nan.

like image 171
kliron Avatar answered Oct 16 '22 21:10

kliron