How to match a whole word with a regular expression?

People also ask

How do you match a whole word in Python?

To match whole exact words, use the word boundary metacharacter '\b' . This metacharacter matches at the beginning and end of each word—but it doesn't consume anything. In other words, it simply checks whether the word starts or ends at this position (by checking for whitespace or non-word characters).

Which regex matches the whole words dog or cat?

If we want to improve the first example to match whole words only, we would need to use \b(cat|dog)\b. This tells the regex engine to find a word boundary, then either cat or dog, and then another word boundary.

How do I match a range in regex?

The regex [0-9] matches single-digit numbers 0 to 9. [1-9][0-9] matches double-digit numbers 10 to 99. Something like ^[2-9][1-6]$ matches 21 or even 96! Any help would be appreciated.

Try

re.search(r'\bis\b', your_string)

From the docs:

\b Matches the empty string, but only at the beginning or end of a word.

Note that the re module uses a naive definition of "word" as a "sequence of alphanumeric or underscore characters", where "alphanumeric" depends on locale or unicode options.

Also note that without the raw string prefix, \b is seen as "backspace" instead of regex word boundary.

Try using the "word boundary" character class in the regex module, re:

x="this is a sample"
y="this isis a sample."
regex=re.compile(r"\bis\b")  # For ignore case: re.compile(r"\bis\b", re.IGNORECASE)

regex.findall(y)
[]

regex.findall(x)
['is']

From the documentation of re.search().

\b matches the empty string, but only at the beginning or end of a word

...

For example r'\bfoo\b' matches 'foo', 'foo.', '(foo)', 'bar foo baz' but not 'foobar' or 'foo3'

I think that the behavior desired by the OP was not completely achieved using the answers given. Specifically, the desired output of a boolean was not accomplished. The answers given do help illustrate the concept, and I think they are excellent. Perhaps I can illustrate what I mean by stating that I think that the OP used the examples used because of the following.

The string given was,

a = "this is a sample"

The OP then stated,

I want to match whole word - for example match "hi" should return False since "hi" is not a word ...

As I understand, the reference is to the search token, "hi" as it is found in the word, "this". If someone were to search the string, a for the word "hi", they should receive False as the response.

The OP continues,

... and "is" should return True since there is no alpha character on the left and on the right side.

In this case, the reference is to the search token "is" as it is found in the word "is". I hope this helps clarify things as to why we use word boundaries. The other answers have the behavior of "don't return a word unless that word is found by itself -- not inside of other words." The "word boundary" shorthand character class does this job nicely.

Only the word "is" has been used in examples up to this point. I think that these answers are correct, but I think that there is more of the question's fundamental meaning that needs to be addressed. The behavior of other search strings should be noted to understand the concept. In other words, we need to generalize the (excellent) answer by @georg using re.match(r"\bis\b", your_string) The same r"\bis\b" concept is also used in the answer by @OmPrakash, who started the generalizing discussion by showing

>>> y="this isis a sample."
>>> regex=re.compile(r"\bis\b")  # For ignore case: re.compile(r"\bis\b", re.IGNORECASE)
>>> regex.findall(y)
[]

Let's say the method which should exhibit the behavior I've discussed is named

find_only_whole_word(search_string, input_string)

The following behavior should then be expected.

>>> a = "this is a sample"
>>> find_only_whole_word("hi", a)
False
>>> find_only_whole_word("is", a)
True

Once again, this is how I understand the OP's question. We have a step towards that behavior with the answer from @georg , but it's a little hard to interpret/implement. to wit

>>> import re
>>> a = "this is a sample"
>>> re.search(r"\bis\b", a)
<_sre.SRE_Match object; span=(5, 7), match='is'>
>>> re.search(r"\bhi\b", a)
>>>

There is no output from the second command. The useful answer from @OmPrakesh shows output, but not True or False.

Here's a more complete sampling of the behavior to be expected.

>>> find_only_whole_word("this", a)
True
>>> find_only_whole_word("is", a)
True
>>> find_only_whole_word("a", a)
True
>>> find_only_whole_word("sample", a)
True
# Use "ample", part of the word, "sample": (s)ample
>>> find_only_whole_word("ample", a)
False
# (t)his
>>> find_only_whole_word("his", a)
False
# (sa)mpl(e)
>>> find_only_whole_word("mpl", a)
False
# Any random word
>>> find_only_whole_word("applesauce", a)
False
>>>

This can be accomplished by the following code:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
#@file find_only_whole_word.py

import re

def find_only_whole_word(search_string, input_string):
  # Create a raw string with word boundaries from the user's input_string
  raw_search_string = r"\b" + search_string + r"\b"

  match_output = re.search(raw_search_string, input_string)
  ##As noted by @OmPrakesh, if you want to ignore case, uncomment
  ##the next two lines
  #match_output = re.search(raw_search_string, input_string, 
  #                         flags=re.IGNORECASE)

  no_match_was_found = ( match_output is None )
  if no_match_was_found:
    return False
  else:
    return True

##endof:  find_only_whole_word(search_string, input_string)

A simple demonstration follows. Run the Python interpreter from the same directory where you saved the file, find_only_whole_word.py.

>>> from find_only_whole_word import find_only_whole_word
>>> a = "this is a sample"
>>> find_only_whole_word("hi", a)
False
>>> find_only_whole_word("is", a)
True
>>> find_only_whole_word("cucumber", a)
False
# The excellent example from @OmPrakash
>>> find_only_whole_word("is", "this isis a sample")
False
>>>

Related questions
                            
                                Preserving styles using python's xlrd,xlwt, and xlutils.copy
                            
                                Inheriting methods' docstrings in Python
                            
                                Converting YAML file to python dict
                            
                                Parsing reStructuredText into HTML
                            
                                Setting a default value in sqlalchemy
                            
                                Why can't I iterate twice over the same data?
                            
                                Is there a formatted byte string literal in Python 3.6+?
                            
                                Can I set a header with python's SimpleHTTPServer?
                            
                                How do I change matplotlib's subplot projection of an existing axis?
                            
                                Distributing my Python scripts as JAR files with Jython?
                            
                                Can Python's unittest test in parallel, like nose can?
                            
                                Using locals() and format() method for strings: are there any caveats?
                            
                                Keras ValueError: Input 0 is incompatible with layer conv2d_1: expected ndim=4, found ndim=5
                            
                                Why is a `for` loop so much faster to count True values?
                            
                                python logging ensure a handler is added only once
                            
                                How are generators and coroutines implemented in CPython?
                            
                                How to use logging.getLogger(__name__) in multiple modules
                            
                                What is the reason for difference between integer division and float to int conversion in python?
                            
                                Reverting from multiindex to single index dataframe in pandas
                            
                                Collection object is not callable error with PyMongo

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to match a whole word with a regular expression?

Tags:

python

regex

People also ask

Recent Activity

Donate For Us