Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you join dataframes with multiple keys in one of the joining columns?

I would like to join the following two dataframes.

The first dataframe has multiple keys in one column

>>> import pandas as pd
>>> df = pd.DataFrame(data={'col1': [1,2,3], 'key': ['x, y','y', 'z, x']})
>>> df
   col1   key
0     1  x, y
1     2     y
2     3  z, x

For each of the key in the first dataframe i have a mapping of sorts in a second dataframe. Like this:

>>> df2 = pd.DataFrame(data= {'key': ['x','y','z'], 'value': ["v1,v2, 
v3","v4,v3", "v5"]})

>>> df2
  key      value
0   x  v1,v2, v3
1   y      v4,v3
2   z         v5

I would like to end up with all values next to their corresponding keys in one column. Ideally with duplicates removed as in col1 (x and y both have v3).

>>> df3
   col1   key           value
0     1  x, y  v1, v2, v3, v4
1     2     y          v4, v3
2     3  z, x  v1, v2, v3, v5
like image 387
Johan Avatar asked Jan 27 '23 12:01

Johan


2 Answers

Check with

d=dict(zip(df2.key,df2.value))
df['New']=[','.join([d.get(y) for y in x.split(', ')]) for x in df.key]

and now we remove the duplicate

df.New=df.New.str.split(',').apply(lambda x : ','.join(set(x)))
df
   col1   key          New
0     1  x, y  v3,v1,v2,v4
1     2     y        v3,v4
2     3  z, x  v5,v3,v1,v2
like image 75
BENY Avatar answered Jan 30 '23 13:01

BENY


Simple for loop

for k,v in zip(df2.key, df2.value): 
    df.key = df.key.str.replace(k,v)

Outputs

    col1    key
0   1       v1,v2, v3, v4,v3
1   2       v4,v3
2   3       v5, v1,v2, v3

To remove the duplicates, can transform

df.key.transform(lambda s: sorted(set([k.strip() for k in s.split(',')])))

    col1    key
0   1       [v1, v2, v3, v4]
1   2       [v3, v4]
2   3       [v1, v2, v3, v5]
like image 23
rafaelc Avatar answered Jan 30 '23 12:01

rafaelc