Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge on one column or another

I would like to merge 2 dataframes:

df1:

                                              cik0        cik1        cik2  
'MKTG, INC.'                            0001019056        None        None   
1 800 FLOWERS COM INC                   0001104659  0001437749        None   
11 GOOD ENERGY INC                      0000930413        None        None   
1347 CAPITAL CORP                       0001144204        None        None   
1347 PROPERTY INSURANCE HOLDINGS, INC.  0001387131        None        None 

df2:

              cik Ticker
0      0001144204   AABB
1      0001019056      A
2      0001387131   AABC
3      0001437749     AA
4      0000930413  AAACU

Expected result:

                                              cik0        cik1  cik2 ticker
'MKTG, INC.'                            0001019056        None  None      A
1 800 FLOWERS COM INC                   0001104659  0001437749  None     AA
11 GOOD ENERGY INC                      0000930413        None  None  AAACU
1347 CAPITAL CORP                       0001144204        None  None   AABB
1347 PROPERTY INSURANCE HOLDINGS, INC.  0001387131        None  None   AABC

I would to match cik0 with df2['cik'], if it doesn't work, I would like to look at cik1, and so on.

Thanks for your help!

like image 763
Roger Avatar asked Jan 22 '19 14:01

Roger


2 Answers

You can use pd.Series.map with fillna a few times:

ticker_map = df2.set_index('cik')['Ticker']

df1['ticker'] = df1['cik0'].map(ticker_map)\
                           .fillna(df1['cik1'].map(ticker_map))\
                           .fillna(df1['cik2'].map(ticker_map))

This, however, is a bit tedious. You can define a function to do this iteratively:

def apply_map_on_cols(df, cols, mapper):
    s = df[cols[0]].map(mapper)
    for col in cols[1:]:
        s = s.fillna(df[col].map(mapper))
    return s

df1['ticker'] = df.pipe(apply_map_on_cols,
                        cols=[f'cik{i}' for i in range(3)],
                        mapper=df2.set_index('cik')['Ticker'])
like image 190
jpp Avatar answered Oct 22 '22 06:10

jpp


Another possibility is to use pd.merge to merge the dataframes:

dfs = []  # list to temporarily store partially merged dfs
df1.reset_index(inplace=True)  # reset index to maintain correct index order
for col in df1:  # iterate over columns
    # append partially merged columns
    dfs.append(pd.merge(df1, df2, left_on=col, right_on=['cik']))
# concat all partial results:
df_result = pd.concat(dfs, axis=0)
df_result.set_index('index', inplace=True)  # set old index
df_result.drop('cik', axis=1, inplace=True)  # drop 'cik' row

This should be several times faster than any method using map when df1.shape[0] >> df1.shape[1] (>> means substantially larger), which should be true for most real use-case data sets.

like image 41
JE_Muc Avatar answered Oct 22 '22 05:10

JE_Muc