Pandas: merge multiple dataframes and control column names?

Tags:

pandas

I would like to merge nine Pandas dataframes together into a single dataframe, doing a join on two columns, controlling the column names. Is this possible?

I have nine datasets. All of them have the following columns:

org, name, items,spend

I want to join them into a single dataframe with the following columns:

org, name, items_df1, spend_df1, items_df2, spend_df2, items_df3...

I've been reading the documentation on merging and joining. I can currently merge two datasets together like this:

ad = pd.DataFrame.merge(df_presents, df_trees,
                        on=['practice', 'name'],
                        suffixes=['_presents', '_trees'])

This works great, doing print list(aggregate_data.columns.values) shows me the following columns:

[org', u'name', u'spend_presents', u'items_presents', u'spend_trees', u'items_trees'...]

But how can I do this for nine columns? merge only seems to accept two at a time, and if I do it sequentially, my column names are going to end up very messy.

831

asked Dec 17 '15 15:12

1 Answers

You could use functools.reduce to iteratively apply pd.merge to each of the DataFrames:

result = functools.reduce(merge, dfs)

This is equivalent to

result = dfs[0]
for df in dfs[1:]:
    result = merge(result, df)

To pass the on=['org', 'name'] argument, you could use functools.partial define the merge function:

merge = functools.partial(pd.merge, on=['org', 'name'])

Since specifying the suffixes parameter in functools.partial would only allow one fixed choice of suffix, and since here we need a different suffix for each pd.merge call, I think it would be easiest to prepare the DataFrames column names before calling pd.merge:

for i, df in enumerate(dfs, start=1):
    df.rename(columns={col:'{}_df{}'.format(col, i) for col in ('items', 'spend')}, 
              inplace=True)

For example,

import pandas as pd
import numpy as np
import functools
np.random.seed(2015)

N = 50
dfs = [pd.DataFrame(np.random.randint(5, size=(N,4)), 
                    columns=['org', 'name', 'items', 'spend']) for i in range(9)]
for i, df in enumerate(dfs, start=1):
    df.rename(columns={col:'{}_df{}'.format(col, i) for col in ('items', 'spend')}, 
              inplace=True)
merge = functools.partial(pd.merge, on=['org', 'name'])
result = functools.reduce(merge, dfs)
print(result.head())

yields

   org  name  items_df1  spend_df1  items_df2  spend_df2  items_df3  \
0    2     4          4          2          3          0          1   
1    2     4          4          2          3          0          1   
2    2     4          4          2          3          0          1   
3    2     4          4          2          3          0          1   
4    2     4          4          2          3          0          1   

   spend_df3  items_df4  spend_df4  items_df5  spend_df5  items_df6  \
0          3          1          0          1          0          4   
1          3          1          0          1          0          4   
2          3          1          0          1          0          4   
3          3          1          0          1          0          4   
4          3          1          0          1          0          4   

   spend_df6  items_df7  spend_df7  items_df8  spend_df8  items_df9  spend_df9  
0          3          4          1          3          0          1          2  
1          3          4          1          3          0          0          3  
2          3          4          1          3          0          0          0  
3          3          3          1          3          0          1          2  
4          3          3          1          3          0          0          3

191

answered Oct 16 '22 06:10

unutbu

Related questions
                            
                                Reshaping array into a square array Python
                            
                                Curve fit with parameter bounds
                            
                                Wrong fit using scipy curve_fit
                            
                                Debugging gdb pretty printers
                            
                                Perform function on pairs of rows in Pandas dataframe
                            
                                Nonlinear colormap with Matplotlib
                            
                                SQLAlchemy Determine If Unique Constraint Exists
                            
                                Any better way to check string for valid IPv4 and IPv6?
                            
                                Streaming server issue with gunicorn and flask and Nginx
                            
                                passing webcam frames to python opencv from webrtc
                            
                                Custom abort mapping/exceptions in Flask
                            
                                PdfFileReader: PdfReadError: Could not find xref table at specified location
                            
                                Python format timedelta greater than 24 hours for display only containing hours?
                            
                                Python Remove duplicates from list of dictionaries based on a value
                            
                                Python PANDAS: Drop All Rows After First Occurrence of Column Value
                            
                                Dataframe-Normalize each row by row's maximum
                            
                                Counting occurrences of columns in numpy array
                            
                                django view render to template from another app
                            
                                Matplotlib table plot, how to add gap between the graph and table
                            
                                Can you do sums with a datetime in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: merge multiple dataframes and control column names?

Tags:

python

pandas

Richard

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us