Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find substring in string but only if whole words?

What is an elegant way to look for a string within another string in Python, but only if the substring is within whole words, not part of a word?

Perhaps an example will demonstrate what I mean:

string1 = "ADDLESHAW GODDARD" string2 = "ADDLESHAW GODDARD LLP" assert string_found(string1, string2)  # this is True string1 = "ADVANCE" string2 = "ADVANCED BUSINESS EQUIPMENT LTD" assert not string_found(string1, string2)  # this should be False 

How can I best write a function called string_found that will do what I need? I thought perhaps I could fudge it with something like this:

def string_found(string1, string2):    if string2.find(string1 + " "):       return True    return False 

But that doesn't feel very elegant, and also wouldn't match string1 if it was at the end of string2. Maybe I need a regex? (argh regex fear)

like image 911
AP257 Avatar asked Nov 11 '10 13:11

AP257


People also ask

How do you split a whole word in Python?

Splitting on a Specific Substring By providing an optional parameter, . split('x') can be used to split a string on a specific substring 'x'. Without 'x' specified, . split() simply splits on all whitespace, as seen above.

How do you search if a word is in a string?

The simplest way to check if a string contains a substring in Python is to use the in operator. This will return True or False depending on whether the substring is found. For example: sentence = 'There are more trees on Earth than stars in the Milky Way galaxy' word = 'galaxy' if word in sentence: print('Word found.

How do you find the full word in Python?

You can use the word boundary metacharacter '\b' to match only whole words.

How do you find all occurrences of substring in a string?

Use the string. count() Function to Find All Occurrences of a Substring in a String in Python. The string. count() is an in-built function in Python that returns the quantity or number of occurrences of a substring in a given particular string.


1 Answers

You can use regular expressions and the word boundary special character \b (highlight by me):

Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that \b is defined as the boundary between \w and \W, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals.

def string_found(string1, string2):    if re.search(r"\b" + re.escape(string1) + r"\b", string2):       return True    return False 

Demo


If word boundaries are only whitespaces for you, you could also get away with pre- and appending whitespaces to your strings:

def string_found(string1, string2):    string1 = " " + string1.strip() + " "    string2 = " " + string2.strip() + " "    return string2.find(string1) 
like image 173
Felix Kling Avatar answered Sep 23 '22 16:09

Felix Kling