Match a whole word in a string using dynamic regex

Tags:

I am looking to see whether a word occurs in a sentence using regex. Words are separated by spaces, but may have punctuation on either side. If the word is in the middle of the string, the following match works (it prevents part-words from matching, allows punctuation on either side of the word).

match_middle_words = " [^a-zA-Z\d ]{0,}" + word + "[^a-zA-Z\d ]{0,} "

This won't however match the first or last word, since there is no trailing/leading space. So, for these cases, I have also been using:

match_starting_word = "^[^a-zA-Z\d]{0,}" + word + "[^a-zA-Z\d ]{0,} "
match_end_word = " [^a-zA-Z\d ]{0,}" + word + "[^a-zA-Z\d]{0,}$"

and then combining with

 match_string = match_middle_words  + "|" + match_starting_word  +"|" + match_end_word

Is there a simple way to avoid the need of three match terms. Specifically, is there a way of specifying 'ether a space or the start of file (i.e. "^") and similar, 'either a space or the end of the file (i.e. "$")?

800

asked May 01 '15 22:05

kyrenia

1 Answers

Why not use a word boundary?

match_string = r'\b' + word + r'\b'
match_string = r'\b{}\b'.format(word)
match_string = rf'\b{word}\b'          # Python 3.7+ required

If you have a list of words (say, in a words variable) to be matched as a whole word, use

match_string = r'\b(?:{})\b'.format('|'.join(words))
match_string = rf'\b(?:{"|".join(words)})\b'         # Python 3.7+ required

In this case, you will make sure the word is only captured when it is surrounded by non-word characters. Also note that \b matches at the string start and end. So, no use adding 3 alternatives.

Sample code:

import re
strn = "word hereword word, there word"
search = "word"
print re.findall(r"\b" + search + r"\b", strn)

And we found our 3 matches:

['word', 'word', 'word']

NOTE ON "WORD" BOUNDARIES

When the "words" are in fact chunks of any chars you should re.escape them before passing to the regex pattern:

match_string = r'\b{}\b'.format(re.escape(word)) # a single escaped "word" string passed
match_string = r'\b(?:{})\b'.format("|".join(map(re.escape, words))) # words list is escaped
match_string = rf'\b(?:{"|".join(map(re.escape, words))})\b' # Same as above for Python 3.7+

If the words to be matched as whole words may start/end with special characters, \b won't work, use unambiguous word boundaries:

match_string = r'(?<!\w){}(?!\w)'.format(re.escape(word))
match_string = r'(?<!\w)(?:{})(?!\w)'.format("|".join(map(re.escape, words)))

If the word boundaries are whitespace chars or start/end of string, use whitespace boundaries, (?<!\S)...(?!\S):

match_string = r'(?<!\S){}(?!\S)'.format(word)
match_string = r'(?<!\S)(?:{})(?!\S)'.format("|".join(map(re.escape, words)))

answered Nov 03 '22 13:11

Wiktor Stribiżew

Related questions
                            
                                Read stdin from inlined python in bash
                            
                                py launcher does not find my Python 2.7
                            
                                Python __reverse__ magic method
                            
                                python logging - default value to extra parameters
                            
                                Getting movie properties with python and opencv
                            
                                Optimizing bigint calls
                            
                                RequestError while updating the index in elasticsearch
                            
                                Python - List files and folders in Bucket
                            
                                Python Matplotlib callback function with parameters
                            
                                AttributeError during Django-rest-framework tutorial 4: authentication
                            
                                Numbers of Day in Month
                            
                                python - getting the MAC address properly in Windows
                            
                                how to display openerp error message
                            
                                Import list variable from separate files in python
                            
                                xlsxwriter module won't open/close Excel file correctly
                            
                                How can I parse a dictionary string?
                            
                                Update primary key Django MySQL
                            
                                Python float precision float
                            
                                change data type of a array in python
                            
                                Set up a Django Project with Mamp?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Match a whole word in a string using dynamic regex

Tags:

python

regex

kyrenia

People also ask

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us