Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I combine two dataframes?

Tags:

python

pandas

People also ask

How do I merge two data frames?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.

Which function is used for combining 2 DataFrames?

concat() for combining DataFrames across rows or columns.

How do you merge DataFrames in Python?

Merge DataFrames Using concat() Here are the most commonly used parameters for the concat() function: objs is the list of DataFrame objects ([df1, df2, ...]) to be concatenated. axis defines the direction of the concatenation, 0 for row-wise and 1 for column-wise. join can either be inner (intersection) or outer (union ...


I believe you can use the append method

bigdata = data1.append(data2, ignore_index=True)

to keep their indexes just dont use the ignore_index keyword ...


You can also use pd.concat, which is particularly helpful when you are joining more than two dataframes:

bigdata = pd.concat([data1, data2], ignore_index=True, sort=False)

Thought to add this here in case someone finds it useful. @ostrokach already mentioned how you can merge the data frames across rows which is

df_row_merged = pd.concat([df_a, df_b], ignore_index=True)

To merge across columns, you can use the following syntax:

df_col_merged = pd.concat([df_a, df_b], axis=1)

If you're working with big data and need to concatenate multiple datasets calling concat many times can get performance-intensive.

If you don't want to create a new df each time, you can instead aggregate the changes and call concat only once:

frames = [df_A, df_B]  # Or perform operations on the DFs
result = pd.concat(frames)

This is pointed out in the pandas docs under concatenating objects at the bottom of the section):

Note: It is worth noting however, that concat (and therefore append) makes a full copy of the data, and that constantly reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension.