I have a DataFrame with a variable I want to map, using a dictionary where the keys are not "normal" strings, but regular expressions.
import pandas as pd
import re
df = pd.DataFrame({'cat': ['A1', 'A2', 'B1']})
What I would like to do is df['cat'].map({'A\d': 'a', 'B1': 'b'})
, but A\d
seems not be interpreted as a regex. In this simple MWE I could do df['cat'].map({'A1': 'a', 'A2': 'a', 'B1': 'b'})
, but in the real world, the regex is much more complicated. Also the dictionary is much more complicated, so that the solution here (which requires to add start and end statementents and apply re.compile
around the keys) is not feasable.
replace
with regex=True
map
takes a callable. When you pass it a dictionary it replaces the dictionary with lambda x: your_dict.get(x, x)
. For your purposes, replace
is appropriate.
df.replace({'A\d': 'a', 'B1': 'b'}, regex=True)
cat
0 a
1 a
2 b
I'm not sure how complicated your dictionary is. But if it is not too long, we can just match and replace one by one:
maps = {'A\d': 'a', 'B1': 'b'}
(pd.concat((df['cat'].str.match(k) for k in maps), axis=1, ignore_index=True)
.dot(pd.Series(d for k,d in maps.items()))
)
Output:
0 a
1 a
2 b
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With