What is an elegant way to look for a string within another string in Python, but only if the substring is within whole words, not part of a word?
Perhaps an example will demonstrate what I mean:
string1 = "ADDLESHAW GODDARD" string2 = "ADDLESHAW GODDARD LLP" assert string_found(string1, string2) # this is True string1 = "ADVANCE" string2 = "ADVANCED BUSINESS EQUIPMENT LTD" assert not string_found(string1, string2) # this should be False
How can I best write a function called string_found that will do what I need? I thought perhaps I could fudge it with something like this:
def string_found(string1, string2): if string2.find(string1 + " "): return True return False
But that doesn't feel very elegant, and also wouldn't match string1 if it was at the end of string2. Maybe I need a regex? (argh regex fear)
Splitting on a Specific Substring By providing an optional parameter, . split('x') can be used to split a string on a specific substring 'x'. Without 'x' specified, . split() simply splits on all whitespace, as seen above.
The simplest way to check if a string contains a substring in Python is to use the in operator. This will return True or False depending on whether the substring is found. For example: sentence = 'There are more trees on Earth than stars in the Milky Way galaxy' word = 'galaxy' if word in sentence: print('Word found.
You can use the word boundary metacharacter '\b' to match only whole words.
Use the string. count() Function to Find All Occurrences of a Substring in a String in Python. The string. count() is an in-built function in Python that returns the quantity or number of occurrences of a substring in a given particular string.
You can use regular expressions and the word boundary special character \b
(highlight by me):
Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that
\b
is defined as the boundary between\w
and\W
, so the precise set of characters deemed to be alphanumeric depends on the values of theUNICODE
andLOCALE
flags. Inside a character range,\b
represents the backspace character, for compatibility with Python’s string literals.
def string_found(string1, string2): if re.search(r"\b" + re.escape(string1) + r"\b", string2): return True return False
Demo
If word boundaries are only whitespaces for you, you could also get away with pre- and appending whitespaces to your strings:
def string_found(string1, string2): string1 = " " + string1.strip() + " " string2 = " " + string2.strip() + " " return string2.find(string1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With