I have two DataFrames and want to use the second one only on the rows whose index is not already contained in the first one.
What is the most efficient way to do this?
Example:
df_1
idx val
0 0.32
1 0.54
4 0.26
5 0.76
7 0.23
df_2
idx val
1 10.24
2 10.90
3 10.66
4 10.25
6 10.13
7 10.52
df_final
idx val
0 0.32
1 0.54
2 10.90
3 10.66
4 0.26
5 0.76
6 10.13
7 0.23
Recap: I need to add the rows in df_2
for which the index is not already in df_1
.
EDIT
Removed some indices in df_2
to illustrate the fact that all indices from df_1
are not covered in df_2
.
You can use reindex
with combine_first
or fillna
:
df = df_1.reindex(df_2.index).combine_first(df_2)
print (df)
val
idx
0 0.32
1 0.54
2 10.90
3 10.66
4 0.26
5 0.76
6 10.13
7 0.23
df = df_1.reindex(df_2.index).fillna(df_2)
print (df)
val
idx
0 0.32
1 0.54
2 10.90
3 10.66
4 0.26
5 0.76
6 10.13
7 0.23
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With