Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging multiple dataframes on column

I am trying to merge/join multiple Dataframes and so far I have no luck. I've found merge method, but it works only with two Dataframes. I also found this SO answer suggesting to do something like that:

df1.merge(df2,on='name').merge(df3,on='name')

Unfortunatelly it will not work in my case, because I have 20+ number of dataframes.

My next idea was to use join. According to the reference when joining multiple dataframes I need to use list and only I can join on index column. So I changed indexes for all of the columns (ok, it can be done grammatically easily) and end up with something like this:

df.join([df1,df2,df3])

Unfortunately, also this approach failed, because other columns names are this same in all dataframes. I've decided to do the last thing, that is renaming all columns. But when I finally joined everything: df = pd.Dataframe() df.join([df1,df2,df3])

I've received empty dataframe. I have no more idea, how I can join them. Can someone suggest anything more?

EDIT1:

Sample input:

import pandas as pd

df1 = pd.DataFrame(np.array([
    ['a', 5, 19],
    ['b', 14, 16],
    ['c', 4, 9]]),
    columns=['name', 'attr1', 'attr2'])
df2 = pd.DataFrame(np.array([
    ['a', 15, 49],
    ['b', 4, 36],
    ['c', 14, 9]]),
    columns=['name', 'attr1', 'attr2'])

df1 
  name attr1 attr2
0    a     5    19
1    b    14    16
2    c     4     9

df2
  name attr1 attr2
0    a    15    49
1    b     4    36
2    c    14     9

Expected output:

df
  name attr1_1 attr2_1 attr1_2 attr2_2
0    a     5    19      15      49
1    b    14    16      4       36
2    c     4     9      14      9

Indexes might be unordered between dataframes, but it is guaranteed, that they will exists.

like image 967
sebap123 Avatar asked Feb 06 '23 13:02

sebap123


2 Answers

use pd.concat

dflist = [df1, df2]
keys = ["%d" % i for i in range(1, len(dflist) + 1)]

merged = pd.concat([df.set_index('name') for df in dflist], axis=1, keys=keys)
merged.columns = merged.swaplevel(0, 1, 1).columns.to_series().str.join('_')

merged

enter image description here

Or

merged.reset_index()

enter image description here

like image 180
piRSquared Avatar answered Feb 11 '23 17:02

piRSquared


use reduce:

def my_merge(df1, df2):
    return df1.merge(df2,on='name')

final_df = reduce(my_merge, df_list)

considering df_list to be a list of your dataframes

like image 45
Nik P Avatar answered Feb 11 '23 17:02

Nik P