Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nested merges in pandas with suffixes

Tags:

python

pandas

I'm trying to merge multiple dataframes in pandas and keep the column labels straight in the resulting dataframe. Here's my test case:

import pandas as pd
df1 = pd.DataFrame(data = [[1,1],[3,1],[5,1]], columns = ['key','val'])
df2 = pd.DataFrame(data = [[1,2],[3,2],[7,2]], columns = ['key','val'])
df3 = pd.DataFrame(data = [[1,3],[2,3],[4,3]], columns = ['key','val'])
df = pd.merge(pd.merge(df1,df2,on='key', suffixes=['_1','_2']),df3,on='key',suffixes=[None,'_3'])

I'm getting this:

df =
     key    val_1   val_2   val
0     1       1      2       3

I'd like to see this:

df =
     key    val_1   val_2   val_3
0     1       1      2       3

The last pair of suffixes that I've specified is: [None,'_3'], the logic being that the pair ['_1','_2'] has created unique column names for the previous merge.

like image 842
EMiller Avatar asked Mar 10 '17 18:03

EMiller


2 Answers

The suffix is needed only when the merged dataframe has two columns with same name. When you merge df3, your dataframe has column names val_1 and val_2 so there is no overlap. You can handle that by renaming val to val_3 like this

df = df1.merge(df2, on = 'key', suffixes=['_1','_2']).merge(df3, on = 'key').rename(columns = {'val': 'val_3'})
like image 186
Vaishali Avatar answered Nov 07 '22 16:11

Vaishali


you have to try this on

df = pd.merge(pd.merge(df1,df2,on='key', suffixes=[None,'_2']),df3,on='key',suffixes=['_1,'_3'])

it's work for me

like image 45
Salbi Faza Avatar answered Nov 07 '22 18:11

Salbi Faza