Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching all numbers with regex using preg_match_all

I have a text, and try to add link to every number of size 3 in it.
I use preg_match_all with a pattern: (^|[^\d])(\d{3})($|[^\d])
Grouping is used here to add links only to numbers, not to their neighbors. Test cases are:

  1. a 123 234 b - Has to match 123 and 234
  2. a 123_234 b - Has to match 123 and 234
  3. aa123 234 b - Has to match 123 and 234
  4. a0123 234 b - Has to match only 234
  5. 123a234 b - Has to match 123 and 234
  6. a 123 234 - Has to match 123 and 234

Tests 2 and 3 work ok, the others fail because of space between 2 numbers.
How to match both numbers with only 1 space between them?

like image 997
Andrey Avatar asked Feb 07 '17 16:02

Andrey


2 Answers

You can "fix" your regex by just replacing the last capturing group with a positive lookahead - (^|[^\d])(\d{3})(?=$|[^\d]) - to allow overlapping matches. The ($|[^\d]) group consumed the space after the 3 digit chunk and the first (^|[^\d]) could not match that space. Surley, I'd replace [^\d] with \D if you prefer this approach.

I suggest using negative lookarounds since that way it looks "cleaner":

(?<!\d)\d{3}(?!\d)
^^^^^^      ^^^^^^

See the regex demo

Details:

  • (?<!\d) - the current location should not be preceded with a digit
  • \d{3} - 3 digits
  • (?!\d) - there must be no digit immediately to the right of the current location.
like image 71
Wiktor Stribiżew Avatar answered Sep 29 '22 11:09

Wiktor Stribiżew


Here is my two cents :

\d{4,}(*SKIP)(*FAIL)|(\d{3})

The regex example is here.

It means :

\d{4,}(*SKIP)(*FAIL)  -> match 4 digits or more but skip the match
|                     -> Or
(\d{3})               -> match 3 digits and capture it. 

It means your regex will match ONLY occurrences of 3 digits in a captured group.

Hope it helps.

EDIT :

Added (*SKIP)(*FAIL) verbs.

These two verbs allows you to force the match to fail. And then, the replacement can be done. (See the substitution part of the regex101 example).

In php, the code will look like this :

$arr = array(
    "a 123 234 b",
    "a 123_234 b",
    "aa123 234 b",
    "a0123 234 b",
    "123a234 b",
    "a 123 234"
);

$regex = "/\d{4,}(*SKIP)(*FAIL)|(\d{3})/";

foreach ($arr as $item) {
    echo preg_replace($regex, '<a href="#">$1</a>', $item);
    echo "\r\n";
}

Output :

a <a href="#">123</a> <a href="#">234</a> b
a <a href="#">123</a>_<a href="#">234</a> b
aa<a href="#">123</a> <a href="#">234</a> b
a0123 <a href="#">234</a> b
<a href="#">123</a>a<a href="#">234</a> b
a <a href="#">123</a> <a href="#">234</a>
like image 38
JazZ Avatar answered Sep 29 '22 11:09

JazZ