Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looping through df dictionary in order to merge df's in Pandas

I have the following dictionary with dataframes

A = pd.DataFrame([[2, 1], [2, 1], [2, 1]], columns=['A', 'B'], index = [1, 2, 3])
B = pd.DataFrame([[1, 1], [2, 2], [3, 3]], columns=['A', 'B'], index = [1, 2, 3])
C = pd.DataFrame([[1, 2], [1, 2], [1, 2]], columns=['A', 'B'], index = [1, 2, 3])

df_all = {'df1': A, 'df2': B, 'df3': C}

I would like to merge them 'inner' by their indexes but with iteration using a for loop. It would have to be the equivalent of doing

df4 = pd.merge(A, B, left_index=True, right_index=True, how='inner')
df5 = pd.merge(df4, C, left_index=True, right_index=True, how='inner')

And the result would look like

   A_x  B_x  A_y  B_y  A  B
1    2    1    1    1  1  2
2    2    1    2    2  1  2
3    2    1    3    3  1  2

I tried something silly like

for key, value in df_all.iteritems():
    df = pd.merge(value, value, left_index=True, right_index=True, how='inner')

But this gives me a nonsense result.

I appreciate the help.

like image 314
hernanavella Avatar asked Sep 04 '14 01:09

hernanavella


1 Answers

import pandas as pd
import functools

A = pd.DataFrame([[2, 1], [2, 1], [2, 1]], columns=['A', 'B'], index = [1, 2, 3])
B = pd.DataFrame([[1, 1], [2, 2], [3, 3]], columns=['A', 'B'], index = [1, 2, 3])
C = pd.DataFrame([[1, 2], [1, 2], [1, 2]], columns=['A', 'B'], index = [1, 2, 3])

df_all = {'df1': A, 'df2': B, 'df3': C}
merge = functools.partial(pd.merge, left_index=True, right_index=True, how='inner')
df = functools.reduce(merge, df_all.values())
print(df)

yields

   A_x  B_x  A_y  B_y  A  B
1    2    1    1    2  1  1
2    2    1    1    2  2  2
3    2    1    1    2  3  3

Note that df_all.values() returns the values in the dict in an unspecified order. If you want a particular order, you'll have to do something like sort by the keys...


Or, you could create a DataFrame with hierarchical columns using pd.concat:

df = pd.concat(df_all, axis=1).dropna(axis=0)
print(df)

yields

   df1     df2     df3   
     A  B    A  B    A  B
1    2  1    1  1    1  2
2    2  1    2  2    1  2
3    2  1    3  3    1  2

(Caveat: Using pd.concat is fragile here -- I'm assuming the DataFrames do not have NaN values, but may have different indexes. The dropna is then used to produce an inner join.)

like image 89
unutbu Avatar answered Sep 23 '22 20:09

unutbu