Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex using word boundary but word ends with a . (period)

Tags:

.net

regex

want to match word i.v. case insensitive

have pattern

(?i)\bi\.v\.

but want a word boundary on the end
the above pattern fails in that it matches
i.v.x

but if I try and add a work boundary to the end

(?i)\bi\.v\.\b

it fails in that it does not even match i.v. as I think the \b is eating the literal . as . is a word break
need the \. to be greedy

i want to match
sam i.v. sam

do not want to match
sam.i.v.
i.v.sam

This get closer

(?i)\bi\.v\.\s$

But it fails to find i.v. at the end of a line

like image 364
paparazzo Avatar asked Aug 01 '13 21:08

paparazzo


People also ask

How do you match a word boundary in regex?

If your regular expression needs to match characters before or after \y, you can easily specify in the regex whether these characters should be word characters or non-word characters. If you want to match any word, \y\w+\y gives the same result as \m. +\M.

How do you escape a period in regex?

(dot) metacharacter, and can match any single character (letter, digit, whitespace, everything). You may notice that this actually overrides the matching of the period character, so in order to specifically match a period, you need to escape the dot by using a slash \.

What does \b do in regex?

When the regexp engine (program module that implements searching for regexps) comes across \b , it checks that the position in the string is a word boundary. There are three different positions that qualify as word boundaries: At string start, if the first string character is a word character \w .


1 Answers

\b only matches between an alphanumeric character and a non-alphanumeric character (or the start/end of string). Therefore, it doesn't match after a ., unless an alphanumeric character immediately follows that dot.

If your intent is to make sure that no non-whitespace character follows after the dot, then you can specify that using a negative lookahead assertion:

(?i)\bi\.v\.(?!\S)

(?!\S) means "Assert that the next character is not a non-whitespace character".

This may sound a bit convoluted - why the double negative? Why not (?=\s) which means "Assert that the next character is a whitespace character"? Well, there is a subtle difference: The second version requires a whitespace character to be there; that means the regex would fail to match at the end of the string. The first regex handles that corner case as well.

If you generally want the concept of "word boundary" to mean "space-delimited", then you need to replace the first \b as well:

(?i)(?<!\S)i\.v\.(?!\S)

or the regex will match sam.i.v. which you don't seem to want it to.

like image 194
Tim Pietzcker Avatar answered Oct 17 '22 17:10

Tim Pietzcker