I have a problem with matching optional pattern groups in regex. Metacharacters * and + are greedy, so I thought metacharacter ? would also be greedy, but it doesn't seem to function like I thought.
Theoretically I assumed that if we chose to make a pattern group optional, if the pattern group is found in the string, it will be returned in the match results, if it isn't found we will still get overall match results, but with this match missing in the results.
What actually happens is if my pattern is matched in the string, it isnt included in the match results, regex seems like it notices that the pattern group is optional and just doesn't bother to even attempt to match it.
If we set up a test and change this optional pattern group to non-optional, regex will include it in the match results, but this is only practical for the test because sometimes this pattern wont be available in the string.
The reason why I need the match included in the results, is because I need the match results for analyzing at a later date.
Encase I have not described this scenario very well, I have setup a very simple example which follows, In PHP.
$string = 'This is a test, Stackoverflow. 2014 Cecili0n';
if(preg_match_all("~(This).*?(Stackoverflow)?~i",$string,$match))
print_r($match);
Results
Array
(
[0] => Array
(
[0] => This
)
[1] => Array
(
[0] => This
)
[2] => Array
(
[0] =>
)
)
(Stackoverflow)? is the optional pattern, if we run the above code, even though this pattern is available in the string, it will not be returned in the match results.
If we make this pattern group mandatory it will be returned in the results, like in the following.
if(preg_match_all("~(This).*?(Stackoverflow)~i",$string,$match))
print_r($match);
Results
Array
(
[0] => Array
(
[0] => This
)
[1] => Array
(
[0] => This
)
[2] => Array
(
[0] => Stackoverflow
)
)
How can I achieve this? It is important for me to get accurate data on how the match was found.
Thanks for any thoughts on the matter.
$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.
Matches a form-feed character. \n. Matches a newline character. \r. Matches a carriage return character.
The Match-zero-or-more Operator ( * ) This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern. `*' represents this operator. For example, `o*' matches any string made up of zero or more `o' s.
This might be surprising, but it is actually expected behavior. Let's break down the regex and translate it to human-readable terms:
(This) Match "This" literally
.*? Match any character **as few times as possible**,
while still allowing the rest of the expression to match
(Stackoverflow)? Match "Stackoverflow" literally **if possible**
So what happens is:
*?
quantifier should match.(Stackoverflow)?
match " is a test, Stackoverflow. 2014 Cecili0n"?.*?
matches zero characters.(Stackoverflow)?
match? Obviously nothing at the position where the match is attempted.End result: both quantified subpatterns match the empty string.
If making everything optional won't work, how do you optionally match "Stackoverflow"? By explicitly spelling out the acceptable options to the regex engine:
~(This)(.*?(Stackoverflow)|.*?)~i
This instructs the engine to either match as much as it can followed by the literal "Stackoverflow", or otherwise match as much as it can. By listing the "Stackoverflow included" option first you are assured that if it does exist in the text it will be matched.
Obviously the .*?
option does not make too much sense in this example, but I am leaving it as it is because I wanted to describe a "mechanical" transformation that will work regardless of the actual regular expression.
Note that to maintain full equivalence with the original regex the extra group introduced for structural purposes has to be made non-capturing:
~(This)(?:.*?(Stackoverflow)|.*)~i
See it in action.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With