Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge a list of pandas dataframes

There has been many similar questions but none specifically to this.

I have a list of data frames and I need to merge them together using a unique column (date). Field names are different so concat is out.

I can manually use df[0].merge(df[1],on='Date').merge(df[3],on='Date) etc. to merge each df one by one, but the issue is that the number of data frames in the list differs with user input.

Is there any way to merge that just combines all data frames in a list at one go? Or perhaps some for in loop at does that?

I am using Python 2.7.

like image 1000
Jake Avatar asked Jun 29 '16 01:06

Jake


People also ask

Can you merge a list of DataFrames in pandas?

To join a list of DataFrames, say dfs , use the pandas. concat(dfs) function that merges an arbitrary number of DataFrames to a single one.

How do I merge all DataFrames in a list?

If we want to merge more than two dataframes we can use cbind() function and pass the resultant cbind() variable into as. list() function to convert it into list .

Can you merge multiple DataFrames in pandas at once?

Pandas' merge and concat can be used to combine subsets of a DataFrame, or even data from different files. join function combines DataFrames based on index or column. Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame.


1 Answers

You can use reduce function where dfList is your list of data frames:

import pandas as pd from functools import reduce reduce(lambda x, y: pd.merge(x, y, on = 'Date'), dfList) 

As a demo:

df = pd.DataFrame({'Date': [1,2,3,4], 'Value': [2,3,3,4]}) dfList = [df, df, df] dfList  # [   Date  Value #  0     1      2 #  1     2      3 #  2     3      3 #  3     4      4,    Date  Value #  0     1      2 #  1     2      3 #  2     3      3 #  3     4      4,    Date  Value #  0     1      2 #  1     2      3 #  2     3      3 #  3     4      4]  reduce(lambda x, y: pd.merge(x, y, on = 'Date'), dfList) #   Date  Value_x  Value_y  Value # 0    1        2        2      2 # 1    2        3        3      3 # 2    3        3        3      3 # 3    4        4        4      4 
like image 129
Psidom Avatar answered Sep 17 '22 04:09

Psidom