How to keep index when using pandas merge

People also ask

How do you keep index on PD merge?

How to Keep index when using Pandas Merge. By default, Pandas merge creates a new integer index for the merged DataFrame. If we wanted to preserve the index from the first DataFrame as the index of the merged DataFrame, we can specify the index explicitly using . set_axis(df1.

Does Pandas Groupby preserve index?

The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.

Does PD concat merge on index?

pd. concat joins on the index and can join two or more DataFrames at once. It does a full outer join by default.

How do I merge index columns in Pandas?

The merge() function is used to merge DataFrame or named Series objects with a database-style join. The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.

In [5]: a.reset_index().merge(b, how="left").set_index('index')
Out[5]:
       col1  to_merge_on  col2
index
a         1            1     1
b         2            3     2
c         3            4   NaN

Note that for some left merge operations, you may end up with more rows than in a when there are multiple matches between a and b. In this case, you may need to drop duplicates.

You can make a copy of index on left dataframe and do merge.

a['copy_index'] = a.index
a.merge(b, how='left')

I found this simple method very useful while working with large dataframe and using pd.merge_asof() (or dd.merge_asof()).

This approach would be superior when resetting index is expensive (large dataframe).

There is a non-pd.merge solution using Series.map and DataFrame.set_index.

In: a['col2'] = a['to_merge_on'].map(b.set_index('to_merge_on')['col2']))
In: a['col2']
Out:
   col1  to_merge_on  col2
a     1            1   1.0
b     2            3   2.0
c     3            4   NaN

This doesn't introduce a dummy index name for the index.

Note however that there is no DataFrame.map method, and so this approach is not for multiple columns.

df1 = df1.merge(df2, how="inner", left_index=True, right_index=True)

This allows to preserve the index of df1

Assuming that the resulting df has the same number of rows and order as your first df, you can do this:

c = pd.merge(a, b, on='to_merge_on')
c.set_index(a.index,inplace=True)

another simple option is to rename the index to what was before:

a.merge(b, how="left").set_axis(a.index)

merge preserves the order at dataframe 'a', but just resets the index so it's save to use set_axis

Related questions
                            
                                How to debug a Flask app
                            
                                How can I convert a .py to .exe for Python?
                            
                                Plot yerr/xerr as shaded region rather than error bars
                            
                                How to read a .xlsx file using the pandas Library in iPython?
                            
                                How do I correctly setup and teardown for my pytest class with tests?
                            
                                What is Truthy and Falsy? How is it different from True and False?
                            
                                Showing line numbers in IPython/Jupyter Notebooks
                            
                                Flask raises TemplateNotFound error even though template file exists
                            
                                How can I see function arguments in IPython Notebook Server 3?
                            
                                How to break a line of chained methods in Python?
                            
                                Flask SQLAlchemy query, specify column names
                            
                                "ssl module in Python is not available" when installing package with pip3
                            
                                Fitting empirical distribution to theoretical ones with Scipy (Python)?
                            
                                error installing psycopg2, library not found for -lssl
                            
                                decorators in the python standard lib (@deprecated specifically)
                            
                                A tool to convert MATLAB code to Python [closed]
                            
                                Is explicitly closing files important?
                            
                                Can't import my own modules in Python
                            
                                What is choice_set in this Django app tutorial?
                            
                                Python Mocking a function from an imported module

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to keep index when using pandas merge

Tags:

python

pandas

People also ask

Recent Activity

Donate For Us