Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match all substrings that end with 4 digits using regular expressions

I am trying to split a string in php, which looks like this:

ABCDE1234ABCD1234ABCDEF1234

Into an array of string which, in this case, would look like this:

ABCDE1234
ABCD1234
ABCDEF1234

So the pattern is "an undefined number of letters, and then 4 digits, then an undefined number of letters and 4 digits etc."

I'm trying to split the string using preg_split like this:

$pattern = "#[0-9]{4}$#";
preg_split($pattern, $stringToSplit);

And it returns an array containing the full string (not split) in the first element.

I'm guessing the problem here is my regex as I don't fully understand how to use them, and I am not sure if I'm using it correctly.

So what would be the correct regex to use?

like image 284
DevBob Avatar asked Nov 03 '16 13:11

DevBob


People also ask

How do you match a regular expression with digits?

To match any number from 0 to 9 we use \d in regex. It will match any single digit number from 0 to 9. \d means [0-9] or match any number from 0 to 9. Instead of writing 0123456789 the shorthand version is [0-9] where [] is used for character range.

What does ?= Mean in regex?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

How do I match a specific character in regex?

Match any specific character in a setUse square brackets [] to match any characters in a set. Use \w to match any single alphanumeric character: 0-9 , a-z , A-Z , and _ (underscore). Use \d to match any single digit. Use \s to match any single whitespace character.

What does * do in regex?

This operator is similar to the match-zero-or-more operator except that it repeats the preceding regular expression at least once; see section The Match-zero-or-more Operator ( * ), for what it operates on, how some syntax bits affect it, and how Regex backtracks to match it.


2 Answers

PHP uses PCRE-style regexes which let you do lookbehinds. You can use this to see if there are 4 digits "behind" you. Combine that with a lookahead to see if there's a letter ahead of you, and you get this:

(?<=\d{4})(?=[a-z])

Notice the dotted lines on the Debuggex Demo page. Those are the points you want to split on.

In PHP this would be:

var_dump(preg_split('/(?<=\d{4})(?=[a-z])/i', 'ABCDE1234ABCD1234ABCDEF1234'));
like image 27
asontu Avatar answered Sep 19 '22 08:09

asontu


You don't want preg_split, you want preg_match_all:

$str = 'ABCDE1234ABCD1234ABCDEF1234';
preg_match_all('/[a-z]+[0-9]{4}/i', $str, $matches);
var_dump($matches);

Output:

array(1) {
  [0]=>
  array(3) {
    [0]=>
    string(9) "ABCDE1234"
    [1]=>
    string(8) "ABCD1234"
    [2]=>
    string(10) "ABCDEF1234"
  }
}
like image 167
mister martin Avatar answered Sep 20 '22 08:09

mister martin