Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regular expression to search for words in a sentence

Tags:

python

regex

Im still learning the ropes with Python ad regular expressions and I need some help please! I am in need of a regular expression that can search a sentence for specific words. I have managed to create a pattern to search for a single word but how do i retrieve the other words i need to find? How would the re pattern look like to do this?

>>> question = "the total number of staff in 30?"
>>> re_pattern = r'\btotal.*?\b'
>>> m = re.findall(re_pattern, question)
['total']

It must look for the words "total" and "staff" Thanks Mike

like image 979
Mike Barnes Avatar asked Jun 03 '26 18:06

Mike Barnes


2 Answers

Use the union operator | to search for all the words you need to find:

In [20]: re_pattern = r'\b(?:total|staff)\b'

In [21]: re.findall(re_pattern, question)
Out[21]: ['total', 'staff']

This matches your example above most closely. However, this approach only works if there are no other characters which have been prepended or appended to a word. This is often the case at the end of main and subordinate clauses in which a comma, a dot, an exclamation mark or a question mark are appended to the last word of the clause.

For example, in the question How many people are in your staff? the approach above wouldn't find the word staff because there is no word boundary at the end of staff. Instead, there is a question mark. But if you leave out the second \b at the end of the regular expression above, the expression would wrongly detect words in substrings, such as total in totally or totalities.

The best way to accomplish what you want is to extract all alphanumeric characters in your sentence first and then search this list for the words you need to find:

In [51]: def find_all_words(words, sentence):
....:     all_words = re.findall(r'\w+', sentence)
....:     words_found = []
....:     for word in words:
....:         if word in all_words:
....:             words_found.append(word)
....:     return words_found

In [52]: print find_all_words(['total', 'staff'], 'The total number of staff in 30?')
['total', 'staff'] 

In [53]: print find_all_words(['total', 'staff'], 'My staff is totally overworked.')
['staff']
like image 83
pemistahl Avatar answered Jun 06 '26 09:06

pemistahl


question = "the total number of staff in 30?"
find=["total","staff"]
words=re.findall("\w+",question)
result=[x for x in find if x in words]
result
['total', 'staff']
like image 44
daya Avatar answered Jun 06 '26 09:06

daya



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!