Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Expected lifespan of ereg, migrating to preg [duplicate]

Tags:

regex

php

ereg

I work on a large PHP application (>1 million lines, 10 yrs old) which makes extensive use of ereg and ereg_replace - currently 1,768 unique regular expressions in 516 classes.

I'm very aware why ereg is being deprecated but clearly migrating to preg could be highly involved.

Does anyone know how long ereg support is likely to be maintained in PHP, and/or have any advice for migrating to preg on this scale. I suspect automated translation from ereg to preg is impossible/impractical?

like image 610
Oliver Emberton Avatar asked May 03 '11 14:05

Oliver Emberton


3 Answers

I'm not sure when ereg will be removed but my bet is as of PHP 6.0.

Regarding your second issue (translating ereg to preg) doesn't seem something that hard, if your application has > 1 million lines surely you must have the resources to get someone doing this job for a week at most. I would grep all the ereg_ instances in your code and set up some macros in your favorite IDE (simple stuff like adding delimiters, modifiers and so on).

I bet most of the 1768 regexes can be ported using a macro, and the others, well, a good pair of eyes.

Another option might be to write wrappers around the ereg functions if they are not available, implementing the changes as needed:

if (function_exists('ereg') !== true)
{
    function ereg($pattern, $string, &$regs)
    {
        return preg_match('~' . addcslashes($pattern, '~') . '~', $string, $regs);
    }
}

if (function_exists('eregi') !== true)
{
    function eregi($pattern, $string, &$regs)
    {
        return preg_match('~' . addcslashes($pattern, '~') . '~i', $string, $regs);
    }
}

You get the idea. Also, PEAR package PHP Compat might be a viable solution too.


Differences from POSIX regex

As of PHP 5.3.0, the POSIX Regex extension is deprecated. There are a number of differences between POSIX regex and PCRE regex. This page lists the most notable ones that are necessary to know when converting to PCRE.

  1. The PCRE functions require that the pattern is enclosed by delimiters.
  2. Unlike POSIX, the PCRE extension does not have dedicated functions for case-insensitive matching. Instead, this is supported using the /i pattern modifier. Other pattern modifiers are also available for changing the matching strategy.
  3. The POSIX functions find the longest of the leftmost match, but PCRE stops on the first valid match. If the string doesn't match at all it makes no difference, but if it matches it may have dramatic effects on both the resulting match and the matching speed. To illustrate this difference, consider the following example from "Mastering Regular Expressions" by Jeffrey Friedl. Using the pattern one(self)?(selfsufficient)? on the string oneselfsufficient with PCRE will result in matching oneself, but using POSIX the result will be the full string oneselfsufficient. Both (sub)strings match the original string, but POSIX requires that the longest be the result.
like image 198
Alix Axel Avatar answered Oct 05 '22 14:10

Alix Axel


My intuition says that they are never going to remove ereg on purpose. PHP still supports really old and deprecated stuff like register globals. There're simply too many outdated apps out there. There's however a little chance that the extension has to be removed because someone finds a serious vulnerability and there's just nobody to fix it.

In any case, it's worth noting that:

  1. You are not forced to upgrade your PHP installation. It's pretty common to keep outdated servers to run legady apps.

  2. The PHP_Compat PEAR package offers plain PHP version of some native functions. If ereg disappears, it's possible that it gets added.


BTW... In fact, PHP 6 is dead. They realised that their approach to make PHP fully Unicode compliant was harder than they thought and they are rethinking it all. The conclusion is: you can never make perfect predictions.

like image 23
Álvaro González Avatar answered Oct 05 '22 15:10

Álvaro González


I had this problem on a much smaller scale - an application more like 10,000 lines. In every case, all I need to do was switch to preg_replace() and put delimiters around the regex pattern.

Anyone should be able to do that - even a non-programmer can be given a list of filenames and line numbers.

Then just run your tests to watch for any failures that can be fixed.

ereg functions will be removed from PHP6, by the way - http://jero.net/articles/php6.

like image 29
Dan Blows Avatar answered Oct 05 '22 14:10

Dan Blows