having the following example:
import pandas as pd
df = pd.DataFrame({ 'Column A' : ['null',20,30,40,'null'],'Column B' : [100,'null',30,50,'null']});

I need a Python function that takes two columns and compare them:
In case one column is a missing value, we fill it from the other column.
In case both values are 'NULL', we Keep 'NULL'.
In case of different values (inconsistent) replace both values with 'NULL'
return with one attribute
the data should look like this after running the function.

this is what I did so far, I need help implementing step number 3
def myFunction(firAttribute,secAttribute):
x=df.ix[:,[firAttribute,secAttribute]]
x['new']=x[firAttribute].fillna(x[secAttribute])
x['new2']=x[secAttribute].fillna(x[firAttribute])
x['new'] =x['new'].fillna(x['new2'])
return x['new']
You can first replace null to NaN, then combine_first NaN between columns and last use boolean indexing for matching different columns values and fill them NaN:
import pandas as pd
import numpy as np
df = pd.DataFrame({ 'Column A' : ['null',20,30,40,'null'],
'Column B' : [100,'null',30,50,'null']});
print df
Column A Column B
0 null 100
1 20 null
2 30 30
3 40 50
4 null null
#replace null to NaN
df = df.replace("null", np.nan)
print df
Column A Column B
0 NaN 100
1 20 NaN
2 30 30
3 40 50
4 NaN NaN
df['Column A'] = df['Column A'].combine_first(df['Column B'])
df['Column B'] = df['Column B'].combine_first(df['Column A'])
print df
Column A Column B
0 100 100
1 20 20
2 30 30
3 40 50
4 NaN NaN
#inconsistent values replace to NaN
df[df['Column A'] != df['Column B']] = np.nan
print df
Column A Column B
0 100 100
1 20 20
2 30 30
3 NaN NaN
4 NaN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With