Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: merge multiple dataframes and control column names?

Tags:

python

pandas

I would like to merge nine Pandas dataframes together into a single dataframe, doing a join on two columns, controlling the column names. Is this possible?

I have nine datasets. All of them have the following columns:

org, name, items,spend

I want to join them into a single dataframe with the following columns:

org, name, items_df1, spend_df1, items_df2, spend_df2, items_df3...

I've been reading the documentation on merging and joining. I can currently merge two datasets together like this:

ad = pd.DataFrame.merge(df_presents, df_trees,
                        on=['practice', 'name'],
                        suffixes=['_presents', '_trees'])

This works great, doing print list(aggregate_data.columns.values) shows me the following columns:

[org', u'name', u'spend_presents', u'items_presents', u'spend_trees', u'items_trees'...]

But how can I do this for nine columns? merge only seems to accept two at a time, and if I do it sequentially, my column names are going to end up very messy.

like image 831
Richard Avatar asked Dec 17 '15 15:12

Richard


People also ask

How do I change the column names in merging pandas?

Different column names are specified for merges in Pandas using the “left_on” and “right_on” parameters, instead of using only the “on” parameter. Merging dataframes with different names for the joining variable is achieved using the left_on and right_on arguments to the pandas merge function.

How do I merge two Dataframes in pandas based on multiple common column?

To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name.

Can you merge more than 2 Dataframes in pandas?

We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes. Merging multiple Dataframes is similar to SQL join and supports different types of join inner , left , right , outer , cross .


1 Answers

You could use functools.reduce to iteratively apply pd.merge to each of the DataFrames:

result = functools.reduce(merge, dfs)

This is equivalent to

result = dfs[0]
for df in dfs[1:]:
    result = merge(result, df)

To pass the on=['org', 'name'] argument, you could use functools.partial define the merge function:

merge = functools.partial(pd.merge, on=['org', 'name'])

Since specifying the suffixes parameter in functools.partial would only allow one fixed choice of suffix, and since here we need a different suffix for each pd.merge call, I think it would be easiest to prepare the DataFrames column names before calling pd.merge:

for i, df in enumerate(dfs, start=1):
    df.rename(columns={col:'{}_df{}'.format(col, i) for col in ('items', 'spend')}, 
              inplace=True)

For example,

import pandas as pd
import numpy as np
import functools
np.random.seed(2015)

N = 50
dfs = [pd.DataFrame(np.random.randint(5, size=(N,4)), 
                    columns=['org', 'name', 'items', 'spend']) for i in range(9)]
for i, df in enumerate(dfs, start=1):
    df.rename(columns={col:'{}_df{}'.format(col, i) for col in ('items', 'spend')}, 
              inplace=True)
merge = functools.partial(pd.merge, on=['org', 'name'])
result = functools.reduce(merge, dfs)
print(result.head())

yields

   org  name  items_df1  spend_df1  items_df2  spend_df2  items_df3  \
0    2     4          4          2          3          0          1   
1    2     4          4          2          3          0          1   
2    2     4          4          2          3          0          1   
3    2     4          4          2          3          0          1   
4    2     4          4          2          3          0          1   

   spend_df3  items_df4  spend_df4  items_df5  spend_df5  items_df6  \
0          3          1          0          1          0          4   
1          3          1          0          1          0          4   
2          3          1          0          1          0          4   
3          3          1          0          1          0          4   
4          3          1          0          1          0          4   

   spend_df6  items_df7  spend_df7  items_df8  spend_df8  items_df9  spend_df9  
0          3          4          1          3          0          1          2  
1          3          4          1          3          0          0          3  
2          3          4          1          3          0          0          0  
3          3          3          1          3          0          1          2  
4          3          3          1          3          0          0          3  
like image 191
unutbu Avatar answered Oct 16 '22 06:10

unutbu