Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a string by repeated characters in PHP?

Tags:

string

php

binary

I'm trying to split a string with binary into an array of repeated characters.

For example, an array of 10001101 split with this function would be:

    $arr[0] = '1';
    $arr[1] = '000';
    $arr[2] = '11';
    $arr[3] = '0';
    $arr[4] = '1';

(I tried to make myself clear, but if you still don't understand, my question is the same as this one but for PHP, not Python)

like image 835
R__ Avatar asked Oct 18 '15 10:10

R__


2 Answers

You can use preg_split like so:

Example:

$in = "10001101";
$out = preg_split('/(.)(?!\1|$)\K/', $in);

print_r($out);

Output:

Array
(
    [0] => 1
    [1] => 000
    [2] => 11
    [3] => 0
    [4] => 1
)

The regex:

  • (.) - match a single character and capture it
  • (?!\1|$) - look at the next position and match if it's not the same as the one we just found nor the end of the string.
  • \K - keeps the text matched so far out of the overall regex match, making this match zero-width.

Note: this does not work in PHP versions prior to 5.6.13 as there was a bug involving bump-along behavior with \K.


An alternative regex that works in earlier versions as well is:

$out = preg_split('/(?<=(.))(?!\1|$)/', $in);

This uses a lookbehind rather that \K in order to make the match zero-width.

like image 129
user3942918 Avatar answered Oct 20 '22 16:10

user3942918


<?php
$s = '10001101';
preg_match_all('/((.)\2*)/',$s,$m);
print_r($m[0]);
/*
Array
(
    [0] => 1
    [1] => 000
    [2] => 11
    [3] => 0
    [4] => 1
)
*/
?>

Matches repeated character sequences of 1 or more. The regex stores the subject character into the second capture group ((.), stored as $m[1]), while the first capture group contains the entire repeat sequence (((.)\2*), stored as $m[0]). With preg_match_all, it does this globally over the entire string. This can be applied for any string, e.g. 'aabbccddee'. If you want to limit to just 0 and 1, then use [01] instead of . in the second capture group.

Keep in mind $m may be empty, to first check if the result exists, i.e. isset($m[0]), before you use it.

like image 2
zamnuts Avatar answered Oct 20 '22 16:10

zamnuts