Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dollar Sign "\$" in Regular Expressions with word boundaries "\b" (PHP / JavaScript)

I am aware that the issue involving the dollar sign "$" in regex (here: either in PHP and JavaScript) has been discussed numerous times before: Yes, I know that I need to add a backslash "\" in front of it (depending on the string processing even two), but the correct way to match a dollar sign is "\$". ... Been there, done that, works fine.


But here's my new problem: Dollar signs "\$" next to word boundaries marked with "\b". ... My following examples can easily be reproduced on e.g. regexpal.com.

Let's start with the following text to search in:

Dollar 50

Dollars 50

$ 50

USD 50

My regex should find either "USD", "Dollar", or "$". Easy enough: Let's try

(USD|Dollar|\$)

Success: It finds the "$", the "USD", and both "Dollar" occurrences, including in "Dollars".

But let's try to skip the "Dollars" by adding word boundaries after the multiple choice:

(USD|Dollar|\$)\b

And this is trouble: "USD" is matched, "Dollar" is matched, "Dollars" is rejected ... But the single, properly backslashed (or escaped) "$" is rejected as well, although that worked just a second before.

It's not related to the multiple choice inside the brackets: Try just

\$

vs.

\$\b

and it's just the same: The first one matches the dollar sign, the second one doesn't.


Another finding:

(USD|Dollar|\$) \b

with a blank " " between the ")" and the "\b" actually works. But that workaround might not be viable under all circumstances (in case there should be a non-whitespace word boundary).


It seems that the escaped dollar sign refuses to be found when word boundaries are involved.

I'd love to hear your suggestions to solve this mystery. -- Thanks a lot in advance!

like image 286
GerZah Avatar asked Sep 30 '15 17:09

GerZah


People also ask

What does \b mean in regex?

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length. There are three different positions that qualify as word boundaries: Before the first character in the string, if the first character is a word character.

What is \b in regex JavaScript?

The RegExp \B Metacharacter in JavaScript is used to find a match which is not present at the beginning or end of a word. If a match is found it returns the word else it returns NULL. Example 1: This example matches the word “for” which is not present at the beginning or end of the word.

How do I use the dollar sign in regex?

If a dollar sign ( $ ) is at the end of the entire regular expression, it matches the end of a line. If an entire regular expression is enclosed by a caret and dollar sign ( ^like this$ ), it matches an entire line. So, to match all strings containing just one characters, use " ^. $ ".

What is \b word boundary?

A word boundary \b is a test, just like ^ and $ . When the regexp engine (program module that implements searching for regexps) comes across \b , it checks that the position in the string is a word boundary.


1 Answers

It doesn't match, because in $ there isn't a word boundary immediately after the $. There would be, however, if a word started immediately after the $ - for example

$Millions

will match.

What you probably want to do is to make the \b apply only to those cases where you really do want to match a word boundary - for example

(USD\b|Dollar\b|\$)

This will insist on there being a word boundary after "USD" and after "Dollar", but not after "$".

like image 139
psmears Avatar answered Oct 20 '22 04:10

psmears