I am aware that the issue involving the dollar sign "$" in regex (here: either in PHP and JavaScript) has been discussed numerous times before: Yes, I know that I need to add a backslash "\" in front of it (depending on the string processing even two), but the correct way to match a dollar sign is "\$". ... Been there, done that, works fine.
But here's my new problem: Dollar signs "\$" next to word boundaries marked with "\b". ... My following examples can easily be reproduced on e.g. regexpal.com.
Let's start with the following text to search in:
Dollar 50
Dollars 50
$ 50
USD 50
My regex should find either "USD", "Dollar", or "$". Easy enough: Let's try
(USD|Dollar|\$)
Success: It finds the "$", the "USD", and both "Dollar" occurrences, including in "Dollars".
But let's try to skip the "Dollars" by adding word boundaries after the multiple choice:
(USD|Dollar|\$)\b
And this is trouble: "USD" is matched, "Dollar" is matched, "Dollars" is rejected ... But the single, properly backslashed (or escaped) "$" is rejected as well, although that worked just a second before.
It's not related to the multiple choice inside the brackets: Try just
\$
vs.
\$\b
and it's just the same: The first one matches the dollar sign, the second one doesn't.
Another finding:
(USD|Dollar|\$) \b
with a blank " " between the ")" and the "\b" actually works. But that workaround might not be viable under all circumstances (in case there should be a non-whitespace word boundary).
It seems that the escaped dollar sign refuses to be found when word boundaries are involved.
I'd love to hear your suggestions to solve this mystery. -- Thanks a lot in advance!
The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length. There are three different positions that qualify as word boundaries: Before the first character in the string, if the first character is a word character.
The RegExp \B Metacharacter in JavaScript is used to find a match which is not present at the beginning or end of a word. If a match is found it returns the word else it returns NULL. Example 1: This example matches the word “for” which is not present at the beginning or end of the word.
If a dollar sign ( $ ) is at the end of the entire regular expression, it matches the end of a line. If an entire regular expression is enclosed by a caret and dollar sign ( ^like this$ ), it matches an entire line. So, to match all strings containing just one characters, use " ^. $ ".
A word boundary \b is a test, just like ^ and $ . When the regexp engine (program module that implements searching for regexps) comes across \b , it checks that the position in the string is a word boundary.
It doesn't match, because in $
there isn't a word boundary immediately after the $
. There would be, however, if a word started immediately after the $
- for example
$Millions
will match.
What you probably want to do is to make the \b
apply only to those cases where you really do want to match a word boundary - for example
(USD\b|Dollar\b|\$)
This will insist on there being a word boundary after "USD" and after "Dollar", but not after "$".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With