How to write word boundary inside character class in python without losing its meaning? I wish to add underscore(_) in definition of word boundary(\b)

Question

I am aware that definition of word boundary is (?<!\w)(?=\w)|(?<=\w)(?!\w) and i wish to add underscore(optionally) too in definition of word boundary.

The one way of doing it is we can simply modify the definition like the new one would be (_)?((?<!\w)(?=\w)|(?<=\w)(?!\w)) , but don't wish to use too long expression.

Easy Approach can be If i can write word boundary inside character class, then adding underscore inside character class would be very easy just like [\b-], but the problem is that putting \b inside character class i.e. [\b], means back space character not word boundary.

please tell the solution i.e. how to put \b inside character class without losing its original meaning.

Wiktor Stribiżew · Accepted Answer

You may use lookarounds:

(?:\b|(?<=_))word(?=\b|_)
^^^^^^^^^^^^^     ^^^^^^^

See the regex demo where (?:\b|(?<=_)) is a non-capturing group matching either a word boundary or a location preceded with _, and (?=\b|_) is a positive lookahead matching either a word boundary or a _ symbol.

Unfortunately, Python re won't allow using (?<=\b|_) as the lookbehind pattern should be of fixed width (else, you will get look-behind requires fixed-width pattern error).

A Python demo:

import re
rx = r"(?:\b|(?<=_))word(?=\b|_)"
s = "some_word_here and a word there"
print(re.findall(rx,s))

An alternative solution is to use custom word boundaries like (?<![^\W_]) / (?![^\W_]) (see online demo):

rx = r"(?<![^\W_])word(?![^\W_])"

The (?<![^\W_]) negative lookbehind fails a match if there is no character other than non-word and _ char (so, it requires the start of string or any word char excluding _ before the search word) and (?![^\W_]) negative lookahead will fail the match if there is no char other than non-word and _ char (that is, requires the end of string or a word char excluding _).

How to write word boundary inside character class in python without losing its meaning? I wish to add underscore(_) in definition of word boundary(\b)

Tags:

python

regex

Aakash Goel

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us

How to write word boundary inside character class in python without losing its meaning? I wish to add underscore(_) in definition of word boundary(\b)

Tags:

python

regex

Aakash Goel

1 Answers

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us