Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

preg_match fails in php > 5.3

Tags:

php

preg-match

I'm not good with regular expression, so I don't even know what this one does, exactly:

echo preg_match('/^(([a-zA-Z0-9\x2d]{1,63}\x2e)*[a-zA-Z0-9\x2d]{1,63}){1,254}$/', 'example12345678.com>');

I took it from an older version of Zend Framework - 1.5, which is outdated and in the last stable version of the framework this regexp is no longer presented. However, its behavior is curious, because I found no documented explanation or a backward incompatibility note in the official php resources.

The thing is that on php 5.2.* it works fine: returns 0. On php 5.3.10, 5.4.0 (most likely 5.3., 5.4. I presume) it returns FALSE, meaning "an error".

My question is: why? and what is the error? Is it the regexp, some kind of recursion or rules ambiguity? Why it works on php 5.2 if so?


Interestingly enough, if I change 'example12345678.com>' to 'example1234567.com>' (making it one or more char shorter) - it starts working and returns 0. If I change it to '123123123123123123123123123' it works too and returns 1.

UPD: don't know yet if this matters but pcre versions here are 8.02 (php 5.2) vs 8.12 (php 5.3)


UPD2: I do understand what it's for... more or less... and there is no problem with getting anything working right now. As I said - a Zend_Validate_* update solves it. I'll try to describe my concern in other words:

say, I upgrade an important piece of software, making php5.2 > php5.3 switch. I try to find information on all problems I could face (Mostly by reading this: http://php.net/manual/en/migration53.php). The software is a bit old, but it's not ancient, e.g. Zend Framework could be of version 1.5. I check/patch/analyze and fix every bc break and deprecated feature. Even my unit tests run fine.

To my surprise what is described in the question happens. (To be precise, Zend_Validate_Hostname there throws an exception). So now I want to know why I missed this one when upgrading and, what's more important, whether I should recheck all 'preg_match' (and other PCRE utilizing functions) in the app trying various imaginable input data in attempt to find similar "bug fixes".

If it is a "bug fix". Because it looks like a new bug - it used to work as expected in php5.2 and doesn't work anymore.

Was hoping to get some clues to narrow down the search.

like image 401
lcf Avatar asked May 17 '12 22:05

lcf


1 Answers

That is an ugly regular expression. The problem is, there are too many ways that the string might match, and so the engine is running out of memory trying them all before it figures out that it doesn't actually match.

Also, it looks like it's trying to match valid domain names, and it doesn't. I would replace that call to preg_match with a call to this function instead:

function is_valid_domain_name($string) {
    if (strlen($string) > 253) {
        return false;
    }
    $label = '(?!-)[a-zA-Z0-9-]{0,63}(?<!-)';
    return preg_match("/^(?:$label\.){0,126}$label$/", $string);
}

It fails quickly on your problem string:

echo is_valid_domain_name('example12345678.com>'),"\n";
like image 161
Mark Reed Avatar answered Nov 01 '22 08:11

Mark Reed