Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid mixed letter-number when extracting chunks of numbers from string

Tags:

regex

php

I'm writing a PHP function to extract numeric ids from a string like:

$test = '123_123_Foo'

At first I took two different approaches, one with preg_match_all():

$test2 = '123_1256_Foo';
preg_match_all('/[0-9]{1,}/', $test2, $matches);
print_r($matches[0]); // Result: 'Array ( [0] => 123 [1] => 1256 )'

and other with preg_replace() and explode():

$test = preg_replace('/[^0-9_]/', '', $test);
$output = array_filter(explode('_', $test));
print_r($output); // Results: 'Array ( [0] => 123 [1] => 1256 )'

Any of them works well as long as the string does not content mixed letters and numbers like:

$test2 = '123_123_234_Foo2'

The evident result is Array ( [0] => 123 [1] => 1256 [2] => 2 )

So I wrote another regex to get rid off of mixed strings:

$test2 = preg_replace('/([a-zA-Z]{1,}[0-9]{1,}[a-zA-Z]{1,})|([0-9]{1,}[a-zA-Z]{1,}[0-9]{1,})|([a-zA-Z]{1,}[0-9]{1,})|([0-9]{1,}[a-zA-Z]{1,})|[^0-9_]/', '', $test2);
$output = array_filter(explode('_', $test2));
print_r($output); // Results: 'Array ( [0] => 123 [1] => 1256 )'

The problem is evident too, more complicated paterns like Foo2foo12foo1 would pass the filter. And here's where I got a bit stuck.

Recap:

  • Extract a variable ammount of chunks of numbers from string.
  • The string contains at least 1 number, and may contain other numbers and letters separated by underscores.
  • Only numbers not preceded or followed by letters must be extracted.
  • Only the numbers in the first half of the string matter.

Since only the first half is needed I decided to split in the first occurrence of letter or mixed number-letter with preg_split():

$test2 = '123_123_234_1Foo2'
$output = preg_split('/([0-9]{1,}[a-zA-Z]{1,})|[^0-9_]/', $test, 2);
preg_match_all('/[0-9]{1,}/', $output[0], $matches);
print_r($matches[0]); // Results: 'Array ( [0] => 123 [1] => 123 [2] => 234 )'

The point of my question is if is there a simpler, safer or more efficient way to achieve this result.

like image 656
MarioZ Avatar asked Oct 31 '25 21:10

MarioZ


2 Answers

This can be achieved without regex, with explode(), array_filter() and ctype_digit(); e.g:

<?php

$str = '123_123_234_1Foo2';

$digits = array_filter(explode('_', $str), function ($substr) {
  return ctype_digit($substr);
});

print_r($digits);

This yields:

Array
(
    [0] => 123
    [1] => 123
    [2] => 234
)

Note that ctype_digit():

Checks if all of the characters in the provided string are numerical.

So $digits is still an array of strings, albeit numeric.

Hope this helps :)

like image 123
Darragh Enright Avatar answered Nov 02 '25 20:11

Darragh Enright


Use strtok

Regex isn't a magic bullet, and there are FAR simpler fixes for your problem, especially considering you're trying to split on a delimiter.

Any of the following approaches would be cleaner, and more maintainable, and the strtok() approach would probably perform better:

  1. Use explode to create and loop through an array, checking each value.
  2. Use preg_split to do the same, but with more a adaptable approach.
  3. Use strtok, as it is designed exactly for this use-case.

Basic exmple for your case:

function strGetInts(string $str, str $delim) {
    $word = strtok($str, $delim);

    while (false !== $word) {
        if (is_integer($word) {
            yield (int) $word;
        }
        $word = strtok($delim);
    }   
}

$test2 = '123_1256_Foo';

foreach(strGetInts($test2, '_-') as $key {
    print_r($key);
}

Note: the second argument to strtok is string containing ANY delimiter to split the string on. Thus, my example will group results into strings separated by underscores or dashes.

Additional Note: If and only if the string only needs to be split on a single delimiter (underscore only), a method using explode will likely result in better performance. For such a solution, see the other answer in this thread: https://stackoverflow.com/a/46937452/1589379 .

like image 27
Tony Chiboucas Avatar answered Nov 02 '25 20:11

Tony Chiboucas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!