I have a DataFrame , say df1, which has all the columns correct except the 'Employee' column. There is another DataFrame , say df2, which has correct Employee names but stored in the column 'Staff'. I am trying to update df1 based on 'key_df1' and 'key_df2' from the respective DataFrames. Need some help on how to approach this question. (Please see below the expected output in the image)
data1=[['NYC-URBAN','JON','$5000','yes','BANKING','AC32456'],['WDC-RURAL','XING','$4500','Yes','FINANCE','AD45678'],['LONDON-URBAN','EDWARDS','$3500','No','IT','DE43216'],
['SINGAPORE-URBAN','WOLF','$5000','No','SPORTS','RT45327'],['MUMBAI-RURAL','NEMBIAR','$2500','No','IT','Rs454457']]
data2=[['NYC','MIKE','BANKING','BIKING','AH56245'],['WDC','ALPHA','FINANCE','TREKKING','AD45678'],
['LONDON-URBAN','BETA','FINANCE','SLEEPING','DE43216'],['SINGAPORE','WOLF','SPORTS','DANCING','RT45307'],
['MUMBAI','NEMBIAR','IT','ZUDO','RS454453']]
List1=['City','Employee', 'Income','Travelling','Industry', 'Key_df1']
List2=['City','Staff','Industry','Hobby', 'Key_df1']
df1=pd.DataFrame(data1,columns=List1)
df2=pd.DataFrame(data2,columns=List2)
Expected Ouput:

Edit (Additional Query):
Thanks for the response. Along with the above question, I want to concatenate value of 'Employee' column with the 'Travelling' Column from df1 only for the rows in which the Key_df1 and Key_df2 ties in the two DataFrames. Please see below the second expected output.

First set the index in df1 to Key_df1 and save it as a temporary DataFrame:
wrk = df1.set_index('Key_df1')
Then update (in-place) its Employee column using df2 with the index set to Key_df2, taking only Staff column:
wrk.Employee.update(df2.set_index('Key_df2').Staff)
And the last operation is to change the index to a "regular" column and move it to the previous location:
result = wrk.reset_index().reindex(columns=List1)
The result is:
City Employee Income Travelling Industry Key_df1
0 NYC-URBAN JON $5000 yes BANKING AC32456
1 WDC-RURAL ALPHA $4500 Yes FINANCE AD45678
2 LONDON-URBAN BETA $3500 No IT DE43216
3 SINGAPORE-URBAN WOLF $5000 No SPORTS RT45327
4 MUMBAI-RURAL NEMBIAR $2500 No IT Rs454457
Now just update is not enough and the task must be solved another way.
Start from joining df1 with df2.Staff (with set_index to join properly):
result = df1.join(df2.set_index('Key_df2').Staff, on='Key_df1')
The second step (the actual update) is:
result.Employee.where(result.Staff.isna(), result.Staff + '_' + result.Travelling,
inplace=True)
And the last step is to drop Staff column (not necessary any more):
result.drop(columns=['Staff'], inplace=True)
The final result is:
City Employee Income Travelling Industry Key_df1
0 NYC-URBAN JON $5000 yes BANKING AC32456
1 WDC-RURAL ALPHA_Yes $4500 Yes FINANCE AD45678
2 LONDON-URBAN BETA_No $3500 No IT DE43216
3 SINGAPORE-URBAN WOLF $5000 No SPORTS RT45327
4 MUMBAI-RURAL NEMBIAR $2500 No IT Rs454457
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With