Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Lambda Function with Nan Support

I am trying to write a lambda function in Pandas that checks to see if Col1 is a Nan and if so, uses another column's data. I have having trouble getting code (below) to compile/execute correctly.

import pandas as pd
import numpy as np
df=pd.DataFrame({ 'Col1' : [1,2,3,np.NaN], 'Col2': [7, 8, 9, 10]})  
df2=df.apply(lambda x: x['Col2'] if x['Col1'].isnull() else x['Col1'], axis=1)

Does anyone have any good idea on how to write a solution like this with a lambda function or have I exceeded the abilities of lambda? If not, do you have another solution? Thanks.

like image 617
Tyler Russell Avatar asked May 19 '17 04:05

Tyler Russell


People also ask

Does NaN check pandas?

The ways to check for NaN in Pandas DataFrame are as follows: Check for NaN with isnull(). values. any() method.

Is null equal to NaN in pandas?

Operating on Null Values As we have seen, Pandas treats None and NaN as essentially interchangeable for indicating missing or null values.


3 Answers

You need pandas.isnull for check if scalar is NaN:

df = pd.DataFrame({ 'Col1' : [1,2,3,np.NaN],
                 'Col2' : [8,9,7,10]})  

df2 = df.apply(lambda x: x['Col2'] if pd.isnull(x['Col1']) else x['Col1'], axis=1)

print (df)
   Col1  Col2
0   1.0     8
1   2.0     9
2   3.0     7
3   NaN    10

print (df2)
0     1.0
1     2.0
2     3.0
3    10.0
dtype: float64

But better is use Series.combine_first:

df['Col1'] = df['Col1'].combine_first(df['Col2'])

print (df)
   Col1  Col2
0   1.0     8
1   2.0     9
2   3.0     7
3  10.0    10

Another solution with Series.update:

df['Col1'].update(df['Col2'])
print (df)
   Col1  Col2
0   8.0     8
1   9.0     9
2   7.0     7
3  10.0    10
like image 62
jezrael Avatar answered Nov 08 '22 23:11

jezrael


Within pandas 0.24.2, I use

df.apply(lambda x: x['col_name'] if x[col1] is np.nan else expressions_another, axis=1)

because pd.isnull() doesn't work.

in my work,I found the following phenomenon,

No running results:

df['prop'] = df.apply(lambda x: (x['buynumpday'] / x['cnumpday']) if pd.isnull(x['cnumpday']) else np.nan, axis=1)

Results exist:

df['prop'] = df.apply(lambda x: (x['buynumpday'] / x['cnumpday']) if x['cnumpday'] is not np.nan else np.nan, axis=1)

So far, I still don't know the deeper reason, but I have these experiences, for object, use [is np.nan()] or pd.isna(). For a float, use np.isnan() or pd.isna().

like image 21
jiahe Avatar answered Nov 08 '22 23:11

jiahe


Assuming that you do have a second column, that is:

df = pd.DataFrame({ 'Col1' : [1,2,3,np.NaN], 'Col2': [1,2,3,4]})

The correct solution to this problem would be:

df['Col1'].fillna(df['Col2'], inplace=True)
like image 39
Gerges Avatar answered Nov 08 '22 23:11

Gerges