Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace missing and inconstant value, Pythons

having the following example:

 import pandas as pd
df = pd.DataFrame({ 'Column A' : ['null',20,30,40,'null'],'Column B' : [100,'null',30,50,'null']});

The link for the example

I need a Python function that takes two columns and compare them:

  1. In case one column is a missing value, we fill it from the other column.

  2. In case both values are 'NULL', we Keep 'NULL'.

  3. In case of different values (inconsistent) replace both values with 'NULL'

  4. return with one attribute

the data should look like this after running the function. the link for the result

this is what I did so far, I need help implementing step number 3

def myFunction(firAttribute,secAttribute):
    x=df.ix[:,[firAttribute,secAttribute]]
    x['new']=x[firAttribute].fillna(x[secAttribute])
    x['new2']=x[secAttribute].fillna(x[firAttribute])
    x['new'] =x['new'].fillna(x['new2'])
    return x['new'] 
like image 689
alghamdi Avatar asked Jan 19 '26 07:01

alghamdi


1 Answers

You can first replace null to NaN, then combine_first NaN between columns and last use boolean indexing for matching different columns values and fill them NaN:

import pandas as pd
import numpy as np

df = pd.DataFrame({ 'Column A' : ['null',20,30,40,'null'],
                    'Column B' : [100,'null',30,50,'null']});
print df
  Column A Column B
0     null      100
1       20     null
2       30       30
3       40       50
4     null     null

#replace null to NaN
df = df.replace("null", np.nan)
print df
   Column A  Column B
0       NaN       100
1        20       NaN
2        30        30
3        40        50
4       NaN       NaN
df['Column A'] = df['Column A'].combine_first(df['Column B'])
df['Column B'] = df['Column B'].combine_first(df['Column A'])
print df
   Column A  Column B
0       100       100
1        20        20
2        30        30
3        40        50
4       NaN       NaN

#inconsistent values replace to NaN
df[df['Column A'] != df['Column B']] = np.nan
print df
   Column A  Column B
0       100       100
1        20        20
2        30        30
3       NaN       NaN
4       NaN       NaN
like image 176
jezrael Avatar answered Jan 20 '26 19:01

jezrael