Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex failing when pattern involves dollar sign ($)

Tags:

regex

php

I'm running into a bit of an issue when it comes to matching subpatterns that involve the dollar sign. For example, consider the following chunk of text:

Regular Price: $20.50       Final Price: $15.20
Regular Price: $18.99       Final Price: $2.25
Regular Price: $11.22       Final Price: $33.44
Regular Price: $55.66       Final Price: $77.88

I was attempting to match the Regular/Final price sets with the following regex, but it simply wasn't working (no matches at all):
preg_match_all("/Regular Price: \$(\d+\.\d{2}).*Final Price: \$(\d+\.\d{2})/U", $data, $matches);

I escaped the dollar sign, so what gives?

like image 578
Mr. Llama Avatar asked Mar 18 '11 21:03

Mr. Llama


People also ask

How do you escape the dollar sign in regex?

You can escape dollar signs with a backslash or with another dollar sign. So $, $$, and \$ all replace with a single dollar sign.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string).

Why dollar is used in regex?

I know that the dollar sign is used to match the character at the end of the string, to make sure that search does not stop in the middle of the string but instead goes on till the end of the string.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.


2 Answers

Inside a double quoted string the backslash is treated as an escape character for the $. The backslash is removed by the PHP parser even before the preg_match_all function sees it:

$r = "/Regular Price: \$(\d+\.\d{2}).*Final Price: \$(\d+\.\d{2})/U";
var_dump($r);

Output (ideone):

"/Regular Price: $(\d+\.\d{2}).*Final Price: $(\d+\.\d{2})/U"
                 ^                           ^
              the backslashes are no longer there

To fix this use a single quoted string instead of a double quoted string:

preg_match_all('/Regular Price: \$(\d+\.\d{2}).*Final Price: \$(\d+\.\d{2})/U',
               $data,
               $matches);

See it working online: ideone

like image 102
Mark Byers Avatar answered Nov 03 '22 18:11

Mark Byers


I know this question is a little old, but I found this while trying to find the answer to the same problem. I saw that it was at the top of the search engine rankings, so I figured it would be good to explain a simple alternative, and why this happens with double quoted strings ( " )

The regular expression I was using contained plenty of single quote characters ( ' ) in it, so I wasn't too keen on wrapping the expression with them, since I didn't want to escape all of those.

My solution was to "double escape" the dollar sign. In your example, it should look something similar to

"/Regular Price: \\\$(\d+\.\d{2}).*Final Price: \\\$(\d+\.\d{2})/U";

Note that the dollar sign contains 3 slashes now \\\.

Basically, we have two "levels" of interpretation, that of PHP, and that of the regex expression. What's happening is that with one slash, PHP interprets it as a literal character instead of variable modifier, so it eats the slash, interprets the string as outlined in Mark's answer, and then sends that to regex, which interprets as a look-behind.

By "double escaping" the dollar sign, PHP interprets \\\$ as \\ and \$ respectively. We escape the \ from the first set of characters, and escape the $ from the second set, resulting in just \$ after PHP interpretation. This will send the literal string

"/Regular Price: \$(\d+\.\d{2}).*Final Price: \$(\d+\.\d{2})/U";

to regex, which will interpret \$ as the character literal $, which will match $ instead of acting as a look behind, since it is escaped. It is important to realize the double layers of interpretation here, since both PHP and regex have their own interpretation rules, and it may take up to 4 slashes to correctly escape characters.

Single quote strings don't have this problem, since to use a variable $foo in a string, we would have to write

'Hello '. $foo .'!';

instead of

"Hello $foo!";

Like we can in double strings. Unlike double quoted strings, single quote strings can't interpret variables inside the string as variables (unless they are appended like in example above), instead interpreting them as plain text. Since we don't have to escape the variable anymore, we can get away with just

'/Regular Price: \$(\d+\.\d{2}).*Final Price: \$(\d+\.\d{2})/U'

which will send \$ to regex, the same as with \\\$ in a double quote string.

It's all a matter of personal preference on which style you use, or which is easier for the pattern.

TL;DR: Use \$ for single-quoted strings like '/Hello \$bob/is', and \\\$ for double quoted strings like "/Hello \\\$bob/is".

like image 33
shmeeps Avatar answered Nov 03 '22 17:11

shmeeps