I am trying to write a lambda function in Pandas that checks to see if Col1 is a Nan and if so, uses another column's data. I have having trouble getting code (below) to compile/execute correctly.
import pandas as pd
import numpy as np
df=pd.DataFrame({ 'Col1' : [1,2,3,np.NaN], 'Col2': [7, 8, 9, 10]})
df2=df.apply(lambda x: x['Col2'] if x['Col1'].isnull() else x['Col1'], axis=1)
Does anyone have any good idea on how to write a solution like this with a lambda function or have I exceeded the abilities of lambda? If not, do you have another solution? Thanks.
The ways to check for NaN in Pandas DataFrame are as follows: Check for NaN with isnull(). values. any() method.
Operating on Null Values As we have seen, Pandas treats None and NaN as essentially interchangeable for indicating missing or null values.
You need pandas.isnull
for check if scalar is NaN
:
df = pd.DataFrame({ 'Col1' : [1,2,3,np.NaN],
'Col2' : [8,9,7,10]})
df2 = df.apply(lambda x: x['Col2'] if pd.isnull(x['Col1']) else x['Col1'], axis=1)
print (df)
Col1 Col2
0 1.0 8
1 2.0 9
2 3.0 7
3 NaN 10
print (df2)
0 1.0
1 2.0
2 3.0
3 10.0
dtype: float64
But better is use Series.combine_first
:
df['Col1'] = df['Col1'].combine_first(df['Col2'])
print (df)
Col1 Col2
0 1.0 8
1 2.0 9
2 3.0 7
3 10.0 10
Another solution with Series.update
:
df['Col1'].update(df['Col2'])
print (df)
Col1 Col2
0 8.0 8
1 9.0 9
2 7.0 7
3 10.0 10
Within pandas 0.24.2, I use
df.apply(lambda x: x['col_name'] if x[col1] is np.nan else expressions_another, axis=1)
because pd.isnull() doesn't work.
in my work,I found the following phenomenon,
No running results:
df['prop'] = df.apply(lambda x: (x['buynumpday'] / x['cnumpday']) if pd.isnull(x['cnumpday']) else np.nan, axis=1)
Results exist:
df['prop'] = df.apply(lambda x: (x['buynumpday'] / x['cnumpday']) if x['cnumpday'] is not np.nan else np.nan, axis=1)
So far, I still don't know the deeper reason, but I have these experiences, for object, use [is np.nan()] or pd.isna(). For a float, use np.isnan() or pd.isna().
Assuming that you do have a second column, that is:
df = pd.DataFrame({ 'Col1' : [1,2,3,np.NaN], 'Col2': [1,2,3,4]})
The correct solution to this problem would be:
df['Col1'].fillna(df['Col2'], inplace=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With