Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace words in pandas Dataframe using dictionary

I have a pandas dataframe

id  text
1   acclrtr actn corr cr
2   plate corr aff
3   alrm alt

and dictionary

dict={'acclrtr':'accelerator','actn':'action','corr':'corrosion','cr':'chemical resistant','aff':'affinity','alrm':'alarm','alt':'alternate'}

I need to replace dictionary key found in dataframe with its value

I tried following codes, but none of them worked Correctly

1.

data['text']=data['text'].str.replace(dict.keys(), dict.values())

2.

data['text']=data['text'].replace(dict, inplace=True)

3.

data['text']=data['text'].apply(lambda x: [item.replace(to_replace=dict) for item in x])

4.

for key, value in dict.items():
    data['text']=data['text'].apply(lambda x: list(set([item.replace(key,value) for item in x])))

Can anyone tell me,where I am doing wrong and how to replace key with value properly?

like image 736
Ranjana Girish Avatar asked Jan 03 '23 19:01

Ranjana Girish


1 Answers

UPDATE:

In [108]: data
Out[108]:
   id                  text
0   1  acclrtr actn corr cr
1   2   plate corr affinity   # NOTE: `affinity`
2   3              alrm alt

In [109]: d2 = {r'(\b){}(\b)'.format(k):r'\1{}\2'.format(v) for k,v in d.items()}

In [110]: d2
Out[110]:
{'(\\b)acclrtr(\\b)': '\\1accelerator\\2',
 '(\\b)actn(\\b)': '\\1action\\2',
 '(\\b)aff(\\b)': '\\1affinity\\2',
 '(\\b)alrm(\\b)': '\\1alarm\\2',
 '(\\b)alt(\\b)': '\\1alternate\\2',
 '(\\b)corr(\\b)': '\\1corrosion\\2',
 '(\\b)cr(\\b)': '\\1chemical resistant\\2'}

In [111]: data['text'] = data['text'].replace(d2, regex=True)

In [112]: data
Out[112]:
   id                                             text
0   1  accelerator action corrosion chemical resistant
1   2                         plate corrosion affinity
2   3                                  alarm alternate

where d - is a replacement dictionary.

PS don't use reserved words like (dict, list, etc) for variable names - it will shadow internal Python types, so you won;t be able to use them properly:

In [1]: dict = dict(a='aaa', b='bbb')

In [2]: dict
Out[2]: {'a': 'aaa', 'b': 'bbb'}

In [3]: dict2 = dict(c='ccc')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-650e1aa39edb> in <module>()
----> 1 dict2 = dict(c='ccc')

TypeError: 'dict' object is not callable

RegEx explanation:

'(\\b)word(\\b)' - means search for a word, preceeding and followed by a word boundary and put both word boundaries in capturing groups: first patenthesis - 1st capturing group, etc.

\\1 - in the substitution part says put there the contents of the first cpaturing group (word boundary in our case)

like image 149
MaxU - stop WAR against UA Avatar answered Jan 05 '23 15:01

MaxU - stop WAR against UA