I have a kind of lookup problem where I have tried to use functions replace dict zip (see below) but that does not exactly produce my desired result because characters (underscores) are removed in the process.
Questions
df1 contains unique strings with underscores arranged in a specific pattern:
import pandas as pd
df1 = pd.DataFrame([['1_1','1_2', '2_1', '2_2'],['1_3','1_4', '2_3', '2_4']])
df1
0 1 2 3
0 1_1 1_2 2_1 2_2
1 1_3 1_4 2_3 2_4
df2 contains a dictionary for some of the strings in df1:
df2 = pd.DataFrame([['1_1',234],['1_2',456],['2_3',324],['2_4',765]], columns = ['a', 'b'])
df2
a b
0 1_1 234
1 1_2 456
2 2_3 324
3 2_4 765
I want to create df3 where exact strings contained in df1 are replaced with the corresponding value in df2.b. However, when I run the following code the underscores in df3 for 2_1, 2_2 etc disappear for strings not contained in df2.
df3 = df1.replace(dict(zip(df2.a, df2.b)))
df3
0 1 2 3
0 234 456 21 22
1 13 14 324 765
The desired result in df3 should instead be:
0 1 2 3
0 234 456 2_1 2_2
1 1_3 1_4 324 765
Or, alternatively:
0 1 2 3
0 234 456 NaN NaN
1 NaN NaN 324 765
You can use df.mask as an alternative:
s=df2.set_index('a')['b']
df1.mask(df1.isin(s.index),df1.replace(s))
0 1 2 3
0 234 456 2_1 2_2
1 1_3 1_4 324 765
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With