It seems to be that the HTML5 spec (and therefore ECMA262) allows <input type="text" pattern="[0-9]/[0-9]" />
to match the string '0/0' even though the forward slash is not escaped. Web applications like Drupal would like to provide server-side validation for browsers that don't support HTML5 with something like:
<?php
preg_match('/^(' . $pattern . ')$/', $value);
?>
Unfortunately the string '[0-9]/[0-9]' is not a valid PRCE regex. It appears that most if not all HTML5-capable browser support both pattern="[0-9]/[0-9]"
and pattern="[0-9]\/[0-9]"
which begs the question - what can we use as a delimiter to run this pattern against Perl-style regex?
We've filed a bug report against the W3C spec but are the browsers wrong here? Does the HTML5 spec need to be clarified? Is there a workaround we can use in PHP?
It is a valid regex if you use #
instead of /
for the delimiter. Example:
preg_match('#^('.$pattern.')$#', $value);
I recomend using "\xFF"
byte as pattern delimiter, because it is not allowed in UTF-8 string, so we can be sure it will not occur in the pattern. And because preg_match does not understand UTF-8, it will cause no trouble.
Example: preg_match("\xFF$pattern\$\xFFADmsu", $subject);
Please note ADmsu
modifiers and adding $
. The u
modifier requires valid UTF-8 bytes only in the pattern, but not in delimiters around.
One of the problems with PCRE is that almost any delimiter is legal for the start and end markers, depending on what makes the rest of the escaping easier. So #foo# is legal, /foo/ is legal, !foo! is legal (I think), etc. Undelimited regex, I'd say, are extremely dangerous for exactly that reason. That sounds like an HTML5 spec bug that it doesn't specify.
Maybe in PHP, scan the string and pick a delimiter from a whitelist that is not present in the string? (Eg, if there's no / use that, if there is use #, if that's there use %, etc.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With