want to match word i.v.
case insensitive
have pattern
(?i)\bi\.v\.
but want a word boundary on the end
the above pattern fails in that it matches
i.v.x
but if I try and add a work boundary to the end
(?i)\bi\.v\.\b
it fails in that it does not even match i.v.
as I think the \b
is eating the literal . as . is a word break
need the \.
to be greedy
i want to match
sam i.v. sam
do not want to match
sam.i.v.
i.v.sam
This get closer
(?i)\bi\.v\.\s$
But it fails to find i.v. at the end of a line
If your regular expression needs to match characters before or after \y, you can easily specify in the regex whether these characters should be word characters or non-word characters. If you want to match any word, \y\w+\y gives the same result as \m. +\M.
(dot) metacharacter, and can match any single character (letter, digit, whitespace, everything). You may notice that this actually overrides the matching of the period character, so in order to specifically match a period, you need to escape the dot by using a slash \.
When the regexp engine (program module that implements searching for regexps) comes across \b , it checks that the position in the string is a word boundary. There are three different positions that qualify as word boundaries: At string start, if the first string character is a word character \w .
\b
only matches between an alphanumeric character and a non-alphanumeric character (or the start/end of string). Therefore, it doesn't match after a .
, unless an alphanumeric character immediately follows that dot.
If your intent is to make sure that no non-whitespace character follows after the dot, then you can specify that using a negative lookahead assertion:
(?i)\bi\.v\.(?!\S)
(?!\S)
means "Assert that the next character is not a non-whitespace character".
This may sound a bit convoluted - why the double negative? Why not (?=\s)
which means "Assert that the next character is a whitespace character"? Well, there is a subtle difference: The second version requires a whitespace character to be there; that means the regex would fail to match at the end of the string. The first regex handles that corner case as well.
If you generally want the concept of "word boundary" to mean "space-delimited", then you need to replace the first \b
as well:
(?i)(?<!\S)i\.v\.(?!\S)
or the regex will match sam.i.v.
which you don't seem to want it to.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With