Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Substring Merge on a column with special characters

I want to merge two dfs over 'Name' columns. However, one of them can be substring or exactly equal to other column. For this, I have used

df1 = pd.DataFrame(
    {
        'Name': ['12,5mg/0,333ml(37,5mg/ml)', 'ad', 'aaa'],
    }
)

df2 = pd.DataFrame(
    {
        'Name': ['12,5mg/0,333ml(37,5mg/ml)', 'ad', 'aaaa'],
    }
)

str_match = "({})".format("|".join(df1.Name))
df2.merge(df1, 
          left_on=df2.Name.str.extract(str_match)[0], 
          right_on="Name", how='outer')

'ad', 'aaa', 'aaaa' values are merged correctly. However, the problem occurs on the value '12,5mg/0,333ml(37,5mg/ml)' (most probably because of special characters). What I got with this code snippet

I have tried re.espace() but it did not help me. How can I solve this problem?

like image 233
icgncl Avatar asked Oct 26 '25 08:10

icgncl


2 Answers

You need to escape any special character. I would also keep only distinct values of the escaped strings (to simplify the regex if there are many repeats) and compile the pattern beforehand:

pattern = re.compile('({})'.format(
    '|'.join(pd.unique(df1.Name.apply(re.escape)))
))
out = df2.merge(
    df1, 
    left_on=df2.Name.str.extract(pattern)[0],
    right_on="Name", how='outer')

>>> out
  Name                       Name_x                     Name_y                    
0  12,5mg/0,333ml(37,5mg/ml)  12,5mg/0,333ml(37,5mg/ml)  12,5mg/0,333ml(37,5mg/ml)
1                         ad                         ad                         ad
2                        aaa                       aaaa                        aaa
like image 199
Pierre D Avatar answered Oct 28 '25 22:10

Pierre D


I think what you want to do is:

str_match = "({})".format("|".join(df1.Name))
df1.merge(df2['Name'], how='outer',
          left_on=df1['Name'].str.extract(str_match, expand=False)[0],
          right_on=df2['Name'].str.extract(str_match, expand=False)[0],
         ).drop(columns='key_0')

which gives:

                      Name_x                     Name_y
0  12,5mg/0,333ml(37,5mg/ml)  12,5mg/0,333ml(37,5mg/ml)
1                         ad                         ad
2                        aaa                       aaaa
like image 28
Serge de Gosson de Varennes Avatar answered Oct 28 '25 22:10

Serge de Gosson de Varennes



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!