Merge two dataframes by index

People also ask

How do I merge multiple DataFrames in pandas on index?

Pandas merge() function is used to merge multiple Dataframes. We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes.

How do I merge two DataFrames based on a column?

Key Points Pandas' merge and concat can be used to combine subsets of a DataFrame, or even data from different files. join function combines DataFrames based on index or column. Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame.

How do I combine two DataFrames?

Joining DataFrames Another way to combine DataFrames is to use columns in each dataset that contain common values (a common unique id). Combining DataFrames using a common field is called “joining”. The columns containing the common values are called “join key(s)”.

Use merge, which is an inner join by default:

pd.merge(df1, df2, left_index=True, right_index=True)

Or join, which is a left join by default:

df1.join(df2)

Or concat), which is an outer join by default:

pd.concat([df1, df2], axis=1)

Samples:

df1 = pd.DataFrame({'a':range(6),
                    'b':[5,3,6,9,2,4]}, index=list('abcdef'))

print (df1)
   a  b
a  0  5
b  1  3
c  2  6
d  3  9
e  4  2
f  5  4

df2 = pd.DataFrame({'c':range(4),
                    'd':[10,20,30, 40]}, index=list('abhi'))

print (df2)
   c   d
a  0  10
b  1  20
h  2  30
i  3  40

# Default inner join
df3 = pd.merge(df1, df2, left_index=True, right_index=True)
print (df3)
   a  b  c   d
a  0  5  0  10
b  1  3  1  20

# Default left join
df4 = df1.join(df2)
print (df4)
   a  b    c     d
a  0  5  0.0  10.0
b  1  3  1.0  20.0
c  2  6  NaN   NaN
d  3  9  NaN   NaN
e  4  2  NaN   NaN
f  5  4  NaN   NaN

# Default outer join
df5 = pd.concat([df1, df2], axis=1)
print (df5)
     a    b    c     d
a  0.0  5.0  0.0  10.0
b  1.0  3.0  1.0  20.0
c  2.0  6.0  NaN   NaN
d  3.0  9.0  NaN   NaN
e  4.0  2.0  NaN   NaN
f  5.0  4.0  NaN   NaN
h  NaN  NaN  2.0  30.0
i  NaN  NaN  3.0  40.0

You can use concat([df1, df2, ...], axis=1) in order to concatenate two or more DFs aligned by indexes:

pd.concat([df1, df2, df3, ...], axis=1)

Or merge for concatenating by custom fields / indexes:

# join by _common_ columns: `col1`, `col3`
pd.merge(df1, df2, on=['col1','col3'])

# join by: `df1.col1 == df2.index`
pd.merge(df1, df2, left_on='col1' right_index=True)

or join for joining by index:

 df1.join(df2)

By default:
join is a column-wise left join
pd.merge is a column-wise inner join
pd.concat is a row-wise outer join

pd.concat:
takes Iterable arguments. Thus, it cannot take DataFrames directly (use [df,df2])
Dimensions of DataFrame should match along axis

Join and pd.merge:
can take DataFrame arguments

A silly bug that got me: the joins failed because index dtypes differed. This was not obvious as both tables were pivot tables of the same original table. After reset_index, the indices looked identical in Jupyter. It only came to light when saving to Excel...

I fixed it with: df1[['key']] = df1[['key']].apply(pd.to_numeric)

Hopefully this saves somebody an hour!

Related questions
                            
                                Autoreload of modules in IPython [duplicate]
                            
                                Python pandas Filtering out nan from a data selection of a column of strings
                            
                                Converting int to bytes in Python 3
                            
                                How to serialize SqlAlchemy result to JSON?
                            
                                How to get a complete list of object's methods and attributes? [duplicate]
                            
                                How can I check for Python version in a program that uses new language features?
                            
                                Pythonic way to combine datetime.date and datetime.time objects
                            
                                Take the content of a list and append it to another list
                            
                                How to create a numpy array of all True or all False?
                            
                                Count number of occurrences of a substring in a string
                            
                                Checking if a string can be converted to float in Python
                            
                                Running Python on Windows for Node.js dependencies
                            
                                How do I read image data from a URL in Python?
                            
                                Unable to set default python version to python3 in ubuntu
                            
                                How to "perfectly" override a dict?
                            
                                What is the preferred syntax for initializing a dict: curly brace literals {} or the dict() function?
                            
                                Python argparse ignore unrecognised arguments
                            
                                e.printStackTrace equivalent in python
                            
                                What is the difference between shallow copy, deepcopy and normal assignment operation?
                            
                                Split a Pandas column of lists into multiple columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Merge two dataframes by index

Tags:

python

merge

concat

pandas

dataframe

People also ask

Recent Activity

Donate For Us