How to Keep index when using Pandas Merge. By default, Pandas merge creates a new integer index for the merged DataFrame. If we wanted to preserve the index from the first DataFrame as the index of the merged DataFrame, we can specify the index explicitly using . set_axis(df1.
The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.
pd. concat joins on the index and can join two or more DataFrames at once. It does a full outer join by default.
The merge() function is used to merge DataFrame or named Series objects with a database-style join. The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.
In [5]: a.reset_index().merge(b, how="left").set_index('index')
Out[5]:
col1 to_merge_on col2
index
a 1 1 1
b 2 3 2
c 3 4 NaN
Note that for some left merge operations, you may end up with more rows than in a
when there are multiple matches between a
and b
. In this case, you may need to drop duplicates.
You can make a copy of index on left dataframe and do merge.
a['copy_index'] = a.index
a.merge(b, how='left')
I found this simple method very useful while working with large dataframe and using pd.merge_asof()
(or dd.merge_asof()
).
This approach would be superior when resetting index is expensive (large dataframe).
There is a non-pd.merge solution using Series.map
and DataFrame.set_index
.
In: a['col2'] = a['to_merge_on'].map(b.set_index('to_merge_on')['col2']))
In: a['col2']
Out:
col1 to_merge_on col2
a 1 1 1.0
b 2 3 2.0
c 3 4 NaN
This doesn't introduce a dummy index
name for the index.
Note however that there is no DataFrame.map
method, and so this approach is not for multiple columns.
df1 = df1.merge(df2, how="inner", left_index=True, right_index=True)
This allows to preserve the index of df1
Assuming that the resulting df has the same number of rows and order as your first df, you can do this:
c = pd.merge(a, b, on='to_merge_on')
c.set_index(a.index,inplace=True)
another simple option is to rename the index to what was before:
a.merge(b, how="left").set_axis(a.index)
merge preserves the order at dataframe 'a', but just resets the index so it's save to use set_axis
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With