Parsing text with python and mapping to dictionary words

Question

I am trying to build a dictionary with frequent terms for my website. So basically I will retrieve a paragraph from my database and this paragraph most likely will contain terms which appear in the aforementioned dictionary. What I am looking for is a nice way (and fast) to parse the paragraph text and map the dictionary terms which might appear in that text with the dictionary entries.

Is there a Python module which can assist me with this task? I am not looking for something fancy but it must be fast.

Thanks

Is there a Python module which can assist me with this task? I am not looking for something fancy but it must be fast.

Thanks

Tim Pietzcker · Accepted Answer

Something like this?

>>> s = "abc def, abcdef"
>>> w = {"abc": "xxx", "def": "yyy"}
>>> def replace(text, words):
...     regex = r"\b(?:" + "|".join(re.escape(word) for word in words) + r")\b"
...     reobj = re.compile(regex, re.I)
...     return reobj.sub(lambda x:words[x.group(0)], text)
...
>>> replace(s, w)
'xxx yyy, abcdef'

Note that this only works reliably if all the dictionary's keys start and end with a letter (or a digit or underscore). Otherwise, the \b word boundaries fail to match.

Parsing text with python and mapping to dictionary words

Tags:

python

parsing

George Eracleous

1 Answers

Tim Pietzcker

Recent Activity

Donate For Us

Parsing text with python and mapping to dictionary words

Tags:

python

parsing

George Eracleous

1 Answers

Tim Pietzcker

Related questions

Recent Activity

Donate For Us