I would like to create a way of matching strings like
abc(xyz)
abc
abc(xyz)[123]
where each bracket is an optional unit. What I would like to have, optimally, is something like
preg_match_all('complicated regex', $mystring, $matches);
with $matches
returning the following:
$mystring= abc(xyz)[123]R
gives $matches=array(0 => "abc", 1=> "xyz", 2=> "123", 3=> "R")
$mystring= abc(xyz)R
gives $matches=array(0 => "abc", 1=> "xyz", 2=> "", 3=> "R")
$mystring= abc[123]R
gives $matches=array(0 => "abc", 1=> "", 2=> "123", 3=> "R")
$mystring= abc(xyz)[123]
gives $matches=array(0 => "abc", 1=> "xyz", 2=> "123", 3=> "")
$mystring= abc
gives $matches=array(0 => "abc", 1=> "", 2=> "", 3=> "")
I hope you get the point. I tried as follows:
preg_match_all("/([a-z]*)(\([a-zA-Z]\))?(\[\w\])?/", "foo(dd)[sdfgh]", $matches)
for which matches[0]
is
Array
(
[0] => foo
[1] =>
[2] => dd
[3] =>
[4] =>
[5] => sdfgh
[6] =>
[7] =>
)
why do I get the additional empty matches? How to avoid them to have results as I need to (either in matches
or in matches[0]
...).
how about:
/^(\w*)(?:\((\w*)\))?(?:\[(\w*)\])(\w*)?$/
usage:
preg_match_all("/^(\w*)(?:\((\w*)\))?(?:\[(\w*)\])(\w*)?$/", "abc[123]R", $matches);
print_r($matches);
output:
Array
(
[0] => Array
(
[0] => abc[123]R
)
[1] => Array
(
[0] => abc
)
[2] => Array
(
[0] =>
)
[3] => Array
(
[0] => 123
)
[4] => Array
(
[0] => R
)
)
explanation:
The regular expression:
(?-imsx:^(\w*)(?:\((\w*)\))?(?:\[(\w*)\])(\w*)?$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\( '('
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0
or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\) ')'
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0
or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
( group and capture to \4 (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
)? end of \4 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \4)
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
You get so many results because your match starts on and on again 8 times. All The string (including empty strings) are matched against the first, non-optinal part of the regex: ([a-z]*)
.
The corrected regex:
preg_match_all("/^([a-z]*)(\([a-zA-Z]*\))?(\[\w*\])?$/", "foo(ddd)[sdfgh]", $matches);
EDIT (to exclude brackets in the second part of the subject)
So we want 'ddd'
instead of '(ddd)'
:
This regex uses a "non capturing pattern" (?: ... )
in order to mark an optional part of the subject, but not to capture it in the matches array.
preg_match_all("/^([a-z]*)(?:\(([a-zA-Z]*)\))?(\[\w*\])?$/", "foo(ddd)[sdfgh]", $matches);
The interesting part is this: (?:\(([a-zA-Z]*)\))?
.
(?:
marks the beginning of a non capturing subpattern\(
is an escaped literal paren(
mark the beginning of standard capturing subpatternOnly contents of the third parens pair will show up in the $matches array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With