I am trying to split a string in php, which looks like this:
ABCDE1234ABCD1234ABCDEF1234
Into an array of string which, in this case, would look like this:
ABCDE1234
ABCD1234
ABCDEF1234
So the pattern is "an undefined number of letters, and then 4 digits, then an undefined number of letters and 4 digits etc."
I'm trying to split the string using preg_split like this:
$pattern = "#[0-9]{4}$#";
preg_split($pattern, $stringToSplit);
And it returns an array containing the full string (not split) in the first element.
I'm guessing the problem here is my regex as I don't fully understand how to use them, and I am not sure if I'm using it correctly.
So what would be the correct regex to use?
To match any number from 0 to 9 we use \d in regex. It will match any single digit number from 0 to 9. \d means [0-9] or match any number from 0 to 9. Instead of writing 0123456789 the shorthand version is [0-9] where [] is used for character range.
?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).
Match any specific character in a setUse square brackets [] to match any characters in a set. Use \w to match any single alphanumeric character: 0-9 , a-z , A-Z , and _ (underscore). Use \d to match any single digit. Use \s to match any single whitespace character.
This operator is similar to the match-zero-or-more operator except that it repeats the preceding regular expression at least once; see section The Match-zero-or-more Operator ( * ), for what it operates on, how some syntax bits affect it, and how Regex backtracks to match it.
PHP uses PCRE-style regexes which let you do lookbehinds. You can use this to see if there are 4 digits "behind" you. Combine that with a lookahead to see if there's a letter ahead of you, and you get this:
(?<=\d{4})(?=[a-z])
Notice the dotted lines on the Debuggex Demo page. Those are the points you want to split on.
In PHP this would be:
var_dump(preg_split('/(?<=\d{4})(?=[a-z])/i', 'ABCDE1234ABCD1234ABCDEF1234'));
You don't want preg_split
, you want preg_match_all
:
$str = 'ABCDE1234ABCD1234ABCDEF1234';
preg_match_all('/[a-z]+[0-9]{4}/i', $str, $matches);
var_dump($matches);
Output:
array(1) {
[0]=>
array(3) {
[0]=>
string(9) "ABCDE1234"
[1]=>
string(8) "ABCD1234"
[2]=>
string(10) "ABCDEF1234"
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With