Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge all data-frames in a dictionary in Python [duplicate]

I have a dictionary dict contains many (more than 100) dataframes. Each dataframe contains two variable name and 'value_i'. For example, the first dataframe in this dictionary dict[1] looks like the following:

name  value_1
A      1
B      1.1 
C      2

Similarly, the second dataframe in this dictionary dict2 looks like the following:

name  value_2
A      1
B      1.1 
D      1.3

I want to merge all dataframes within this dictionary by common variable name.

The expected outcome should looks like the following:

name  value_1   value_2
A      1         1
B      1.1       1.1
C      2         nan
D     nan           1.3

I know I can do pd.merge[dict[i], dict[i-1], how = 'outer', on = 'name' many times to merge all dataframes together. But this is too inefficient.

I tried pd.concat(dict.values(), axis = 1, join='outer' But concat dont allow me to merge by key variable.

Can anyone teach me how to do it more efficiently please?

like image 719
fly36 Avatar asked Dec 26 '18 18:12

fly36


People also ask

How do you combine data frames in Python?

The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.

How do I combine multiple data frames into one?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.

How do you concatenate a dictionary to a DataFrame in Python?

concat() function If a dict is passed, the sorted keys will be used as the keys argument, unless it is passed, in which case the values will be selected (see below). The axis to concatenate along. Handle indexes on other axis (or axes). Field name to join on in left DataFrame.


1 Answers

Given

>>> df1                                                                                                                
  name  value_1
0    A      1.0
1    B      1.1
2    C      2.0
>>> df2                                                                                                                
  name  value_2
0    A      1.0
1    B      1.1
2    D      1.3

and

>>> d = {1: df1, 2: df2} 

you can issue

>>> from functools import partial, reduce                                                                              
>>>                                                                                                                    
>>> my_reduce = partial(pd.merge, on='name', how='outer')                                                              
>>> reduce(my_reduce, d.values())                                                                                      
  name  value_1  value_2
0    A      1.0      1.0
1    B      1.1      1.1
2    C      2.0      NaN
3    D      NaN      1.3
like image 144
timgeb Avatar answered Oct 08 '22 09:10

timgeb