I have two dataframes df1 and df2 . They both contain time-series data, so it is possible some of the dates in df1 and df2 intersect with each other and the rest don't. My requirement is an operation on the two dataframes that replaces the values in df1 with the values in df2 for the same dates, leaves alone values for indexes in df1 not present in df2 and adds the values for indexes present in df2 and not in df1. Consider the following example:
df1:
A B C D
0 A0 BO C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
df2:
A B C E
1 A4 B4 C4 E4
2 A5 B5 C5 E5
3 A6 B6 C6 E6
4 A7 B7 C7 E7
result df:
A B C D E
0 A0 BO C0 D0 Nan
1 A4 B4 C4 D4 E4
2 A5 B5 C5 D5 E5
3 A6 B6 C6 D6 E6
4 A7 B7 C7 D7 E7
I tried to develop the logic with the first step concatenating the two dfs but that leads to rows with duplicate indexes and am not sure how to handle that. How can this be achieved? Any suggestions would help
Edit: A simpler case would be when the column names are same in the two dataframes. So consider df2 has column D instead of E with values D4,D5,D6,D7.
A concatenation yields the following result:
concat(df1,df2,axis=1)
A B C D A B C D
0 A0 B0 C0 D0 NaN NaN NaN NaN
1 A1 B1 C1 D1 A4 B4 C4 D4
2 A2 B2 C2 D2 A5 B5 C5 D5
3 A3 B3 C3 D3 A6 B6 C6 D6
4 NaN NaN NaN NaN A7 B7 C7 D7
Now this introduces duplicate columns. A conventional solution would be to loop through each column but I am looking for a more elegant solution. Any ideas would be appreciated.
update
will align on the indices of both DataFrames:
df1.update(df2)
df1:
A B C D
0 A0 BO C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
df2:
A B C D
1 A4 B4 C4 D4
2 A5 B5 C5 D5
3 A6 B6 C6 D6
4 A7 B7 C7 D7
>>> df1.update(df2)
A B C D
0 A0 BO C0 D0
1 A4 B4 C4 D4
2 A5 B5 C5 D5
3 A6 B6 C6 D6
You then need to add the values in df2 not present in df1:
>>> df1.append(df2.loc[[i for i in df2.index if i not in df1.index], :])
Out[46]:
A B C D
0 A0 BO C0 D0
1 A4 B4 C4 D4
2 A5 B5 C5 D5
3 A6 B6 C6 D6
4 A7 B7 C7 D7
I just saw this question and realized it is almost identical to one that I just asked today and that @Alexander (the poster of the answer above) answered very nicely:
pd.concat([df1[~df1.index.isin(df2.index)], df2])
See pandas DataFrame concat / update ("upsert")? for the discussion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With