Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Regular expression in php with preg_match_all




I would like to create a way of matching strings like


where each bracket is an optional unit. What I would like to have, optimally, is something like

preg_match_all('complicated regex', $mystring, $matches);

with $matches returning the following:

  • If $mystring= abc(xyz)[123]R gives $matches=array(0 => "abc", 1=> "xyz", 2=> "123", 3=> "R")
  • If $mystring= abc(xyz)R gives $matches=array(0 => "abc", 1=> "xyz", 2=> "", 3=> "R")
  • If $mystring= abc[123]R gives $matches=array(0 => "abc", 1=> "", 2=> "123", 3=> "R")
  • If $mystring= abc(xyz)[123] gives $matches=array(0 => "abc", 1=> "xyz", 2=> "123", 3=> "")
  • If $mystring= abc gives $matches=array(0 => "abc", 1=> "", 2=> "", 3=> "")

I hope you get the point. I tried as follows:

preg_match_all("/([a-z]*)(\([a-zA-Z]\))?(\[\w\])?/", "foo(dd)[sdfgh]", $matches)

for which matches[0] is

    [0] => foo
    [1] => 
    [2] => dd
    [3] => 
    [4] => 
    [5] => sdfgh
    [6] => 
    [7] => 

why do I get the additional empty matches? How to avoid them to have results as I need to (either in matches or in matches[0]...).

like image 281
Alex Avatar asked Nov 03 '22 21:11


2 Answers

how about:



preg_match_all("/^(\w*)(?:\((\w*)\))?(?:\[(\w*)\])(\w*)?$/", "abc[123]R", $matches); 


    [0] => Array
            [0] => abc[123]R

    [1] => Array
            [0] => abc

    [2] => Array
            [0] => 

    [3] => Array
            [0] => 123

    [4] => Array
            [0] => R



The regular expression:


matches as follows:

NODE                     EXPLANATION
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
  ^                        the beginning of the string
  (                        group and capture to \1:
    \w*                      word characters (a-z, A-Z, 0-9, _) (0 or
                             more times (matching the most amount
  )                        end of \1
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
    \(                       '('
    (                        group and capture to \2:
      \w*                      word characters (a-z, A-Z, 0-9, _) (0
                               or more times (matching the most
                               amount possible))
    )                        end of \2
    \)                       ')'
  )?                       end of grouping
  (?:                      group, but do not capture:
    \[                       '['
    (                        group and capture to \3:
      \w*                      word characters (a-z, A-Z, 0-9, _) (0
                               or more times (matching the most
                               amount possible))
    )                        end of \3
    \]                       ']'
  )                        end of grouping
  (                        group and capture to \4 (optional
                           (matching the most amount possible)):
    \w*                      word characters (a-z, A-Z, 0-9, _) (0 or
                             more times (matching the most amount
  )?                       end of \4 (NOTE: because you are using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \4)
  $                        before an optional \n, and the end of the
)                        end of grouping
like image 76
Toto Avatar answered Nov 08 '22 10:11


You get so many results because your match starts on and on again 8 times. All The string (including empty strings) are matched against the first, non-optinal part of the regex: ([a-z]*).

The corrected regex:

preg_match_all("/^([a-z]*)(\([a-zA-Z]*\))?(\[\w*\])?$/", "foo(ddd)[sdfgh]", $matches); 

EDIT (to exclude brackets in the second part of the subject) So we want 'ddd' instead of '(ddd)':

This regex uses a "non capturing pattern" (?: ... ) in order to mark an optional part of the subject, but not to capture it in the matches array.

preg_match_all("/^([a-z]*)(?:\(([a-zA-Z]*)\))?(\[\w*\])?$/", "foo(ddd)[sdfgh]", $matches);

The interesting part is this: (?:\(([a-zA-Z]*)\))?.

  • first paren (?: marks the beginning of a non capturing subpattern
  • second paren \( is an escaped literal paren
  • third one ( mark the beginning of standard capturing subpattern

Only contents of the third parens pair will show up in the $matches array.

like image 38
hegemon Avatar answered Nov 08 '22 10:11
