Difference(s) between merge() and concat() in pandas

People also ask

What is the difference between merge () and concat () in pandas?

Concat function concatenates dataframes along rows or columns. We can think of it as stacking up multiple dataframes. Merge combines dataframes based on values in shared columns. Merge function offers more flexibility compared to concat function because it allows combinations based on a condition.

What is concat in pandas?

The concat() function is used to concatenate pandas objects along a particular axis with optional set logic along the other axes. Syntax: pandas.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)

What is difference between concatenate and combine?

The word concatenate is just another way of saying "to combine" or "to join together". The CONCATENATE function allows you to combine text from different cells into one cell. In our example, we can use it to combine the text in column A and column B to create a combined name in a new column.

Is merge or join faster pandas?

As you can see, the merge is faster than joins, though it is small value, but over 4000 iterations, that small value becomes a huge number, in minutes.

A very high level difference is that merge() is used to combine two (or more) dataframes on the basis of values of common columns (indices can also be used, use left_index=True and/or right_index=True), and concat() is used to append one (or more) dataframes one below the other (or sideways, depending on whether the axis option is set to 0 or 1).

join() is used to merge 2 dataframes on the basis of the index; instead of using merge() with the option left_index=True we can use join().

For example:

df1 = pd.DataFrame({'Key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1': range(7)})

df1:
   Key  data1
0   b   0
1   b   1
2   a   2
3   c   3
4   a   4
5   a   5
6   b   6

df2 = pd.DataFrame({'Key': ['a', 'b', 'd'], 'data2': range(3)})

df2:
    Key data2
0   a   0
1   b   1
2   d   2

#Merge
# The 2 dataframes are merged on the basis of values in column "Key" as it is 
# a common column in 2 dataframes

pd.merge(df1, df2)

   Key data1 data2
0   b    0    1
1   b    1    1
2   b    6    1
3   a    2    0
4   a    4    0
5   a    5    0

#Concat
# df2 dataframe is appended at the bottom of df1 

pd.concat([df1, df2])

   Key data1 data2
0   b   0     NaN
1   b   1     NaN
2   a   2     NaN
3   c   3     NaN
4   a   4     NaN
5   a   5     NaN
6   b   6     NaN
0   a   Nan   0
1   b   Nan   1
2   d   Nan   2

At a high level:

.concat() simply stacks multiple DataFrame together either vertically, or stitches horizontally after aligning on index
.merge() first aligns two DataFrame' selected common column(s) or index, and then pick up the remaining columns from the aligned rows of each DataFrame.

More specifically, .concat():

Is a top-level pandas function
Combines two or more pandas DataFrame vertically or horizontally
Aligns only on the index when combining horizontally
Errors when any of the DataFrame contains a duplicate index.
Defaults to outer join with the option for inner join

And .merge():

Exists both as a top-level pandas function and a DataFrame method (as of pandas 1.0)
Combines exactly two DataFrame horizontally
Aligns the calling DataFrame's column(s) or index with the other DataFrame's column(s) or index
Handles duplicate values on the joining columns or index by performing a cartesian product
Defaults to inner join with options for left, outer, and right

Note that when performing pd.merge(left, right), if left has two rows containing the same values from the joining columns or index, each row will combine with right's corresponding row(s) resulting in a cartesian product. On the other hand, if .concat() is used to combine columns, we need to make sure no duplicated index exists in either DataFrame.

Practically speaking:

Consider .concat() first when combining homogeneous DataFrame, while consider .merge() first when combining complementary DataFrame.
If need to merge vertically, go with .concat(). If need to merge horizontally via columns, go with .merge(), which by default merge on the columns in common.

Reference: Pandas 1.x Cookbook

pd.concat takes an Iterable as its argument. Hence, it cannot take DataFrames directly as its argument. Also Dimensions of the DataFrame should match along axis while concatenating.

pd.merge can take DataFrames as its argument, and is used to combine two DataFrames with same columns or index, which can't be done with pd.concat since it will show the repeated column in the DataFrame.

Whereas join can be used to join two DataFrames with different indices.

I am currently trying to understand the essential difference(s) between pd.DataFrame.merge() and pd.concat().

Nice question. The main difference:

`pd.concat` works on both axes.

The other difference, is pd.concat has inner^{^default} and outer joins only, while pd.DataFrame.merge() has left, right, outer, inner^{^default} joins.

Third notable other difference is: pd.DataFrame.merge() has the option to set the column suffixes when merging columns with the same name, while for pd.concat this is not possible.

With pd.concat by default you are able to stack rows of multiple dataframes (axis=0) and when you set the axis=1 then you mimic the pd.DataFrame.merge() function.

Some useful examples of pd.concat:

df2=pd.concat([df]*2, ignore_index=True) #double the rows of a dataframe

df2=pd.concat([df, df.iloc[[0]]]) # add first row to the end

df3=pd.concat([df1,df2], join='inner', ignore_index=True) # concat two df's

Related questions
                            
                                How can I split and parse a string in Python?
                            
                                What is the equivalent of "none" in django templates?
                            
                                How do I save and restore multiple variables in python?
                            
                                ImproperlyConfigured: You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings
                            
                                How can I check if two segments intersect?
                            
                                Is there a standard way to list names of Python modules in a package?
                            
                                Where to find the win32api module for Python? [closed]
                            
                                CSV new-line character seen in unquoted field error
                            
                                Inserting the same value multiple times when formatting a string
                            
                                Python extending with - using super() Python 3 vs Python 2
                            
                                Quick and easy file dialog in Python?
                            
                                Celery Received unregistered task of type (run example)
                            
                                Simpler way to put PDB breakpoints in Python code?
                            
                                How to assert output with nosetest/unittest in python?
                            
                                How to construct a timedelta object from a simple string
                            
                                How to set True as default value for BooleanField on Django?
                            
                                How to beautify JSON in Python?
                            
                                How to convert a negative number to positive?
                            
                                Output data from all columns in a dataframe in pandas [duplicate]
                            
                                Authentication plugin 'caching_sha2_password' is not supported

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference(s) between merge() and concat() in pandas

Tags:

python

merge

join

concat

pandas

People also ask

`pd.concat` works on both axes.

Recent Activity

Donate For Us

Difference(s) between merge() and concat() in pandas

Tags:

python

merge

join

concat

pandas

People also ask

pd.concat works on both axes.

Related questions

Recent Activity

Donate For Us

`pd.concat` works on both axes.