Updating a dataframe based on another dataframe in Python

Question

I have a DataFrame , say df1, which has all the columns correct except the 'Employee' column. There is another DataFrame , say df2, which has correct Employee names but stored in the column 'Staff'. I am trying to update df1 based on 'key_df1' and 'key_df2' from the respective DataFrames. Need some help on how to approach this question. (Please see below the expected output in the image)

data1=[['NYC-URBAN','JON','$5000','yes','BANKING','AC32456'],['WDC-RURAL','XING','$4500','Yes','FINANCE','AD45678'],['LONDON-URBAN','EDWARDS','$3500','No','IT','DE43216'],
     ['SINGAPORE-URBAN','WOLF','$5000','No','SPORTS','RT45327'],['MUMBAI-RURAL','NEMBIAR','$2500','No','IT','Rs454457']]

data2=[['NYC','MIKE','BANKING','BIKING','AH56245'],['WDC','ALPHA','FINANCE','TREKKING','AD45678'],
     ['LONDON-URBAN','BETA','FINANCE','SLEEPING','DE43216'],['SINGAPORE','WOLF','SPORTS','DANCING','RT45307'],
     ['MUMBAI','NEMBIAR','IT','ZUDO','RS454453']]

List1=['City','Employee', 'Income','Travelling','Industry', 'Key_df1']
List2=['City','Staff','Industry','Hobby', 'Key_df1']

df1=pd.DataFrame(data1,columns=List1)
df2=pd.DataFrame(data2,columns=List2)

Expected Ouput:

enter image description here

Edit (Additional Query):

Thanks for the response. Along with the above question, I want to concatenate value of 'Employee' column with the 'Travelling' Column from df1 only for the rows in which the Key_df1 and Key_df2 ties in the two DataFrames. Please see below the second expected output.

enter image description here

Valdi_Bo · Accepted Answer

First set the index in df1 to Key_df1 and save it as a temporary DataFrame:

wrk = df1.set_index('Key_df1')

Then update (in-place) its Employee column using df2 with the index set to Key_df2, taking only Staff column:

wrk.Employee.update(df2.set_index('Key_df2').Staff)

And the last operation is to change the index to a "regular" column and move it to the previous location:

result = wrk.reset_index().reindex(columns=List1)

The result is:

              City Employee Income Travelling Industry   Key_df1
0        NYC-URBAN      JON  $5000        yes  BANKING   AC32456
1        WDC-RURAL    ALPHA  $4500        Yes  FINANCE   AD45678
2     LONDON-URBAN     BETA  $3500         No       IT   DE43216
3  SINGAPORE-URBAN     WOLF  $5000         No   SPORTS   RT45327
4     MUMBAI-RURAL  NEMBIAR  $2500         No       IT  Rs454457

Edit following the comment about Travelling column

Now just update is not enough and the task must be solved another way.

Start from joining df1 with df2.Staff (with set_index to join properly):

result = df1.join(df2.set_index('Key_df2').Staff, on='Key_df1')

The second step (the actual update) is:

result.Employee.where(result.Staff.isna(), result.Staff + '_' + result.Travelling,
    inplace=True)

And the last step is to drop Staff column (not necessary any more):

result.drop(columns=['Staff'], inplace=True)

The final result is:

              City   Employee Income Travelling Industry   Key_df1
0        NYC-URBAN        JON  $5000        yes  BANKING   AC32456
1        WDC-RURAL  ALPHA_Yes  $4500        Yes  FINANCE   AD45678
2     LONDON-URBAN    BETA_No  $3500         No       IT   DE43216
3  SINGAPORE-URBAN       WOLF  $5000         No   SPORTS   RT45327
4     MUMBAI-RURAL    NEMBIAR  $2500         No       IT  Rs454457

Updating a dataframe based on another dataframe in Python

Tags:

python

pandas

dataframe

vlookup

Ussu20

1 Answers

Edit following the comment about Travelling column

Valdi_Bo

Recent Activity

Donate For Us

Updating a dataframe based on another dataframe in Python

Tags:

python

pandas

dataframe

vlookup

Ussu20

1 Answers

Edit following the comment about Travelling column

Valdi_Bo

Related questions

Recent Activity

Donate For Us