I'm writing a simple shell in C under linux. I'm trying to parse user input with POSIX regex with group capturing. My problem is I dont want to capture all the groups, but the ?: symbol desnt seem to work for me.
"^(?:[A-Za-z0-9]+)( [A-Za-z0-9]*(?:\"[^\"]*\")*(?:\'[^\']*\')*[A-Za-z0-9]*)*&?$"
The use of (?:..)
, or any other grouping prefix, is not allowed in POSIX Regular Expressions.
There are tools to make languages, lex & yacc for example, and a simplified yacc grammar for POSIX shells is provided by the standard.
The character sequence (?
is undefined as per section 9.4.3 ERE Special
Characters:
*+?{
The
<asterisk>
,<plus-sign>
,<question-mark>
, and<left-brace>
shall be special except when used in a bracket expression (see RE Bracket Expression). Any of the following uses produce undefined results:
If these characters appear first in an ERE, or immediately following an unescaped
<vertical-line>
,<circumflex>
,<dollar-sign>
, or<left-parenthesis>
If a
<left-brace>
is not part of a valid interval expression (see EREs Matching Multiple Characters)
A POSIX RE implementation has a few choices for how to handle undefined syntax. Those choices include enabling an extended syntax as per section 9.1 Regular Expression Definitions. So it's free to implement the non-capturing group syntax:
[...] violations of the specified syntax or semantics for REs produce undefined results: this may entail an error, enabling an extended syntax for that RE, or using the construct in error as literal characters to be matched.
If you'd like to see the feature as part of a future POSIX standard, you could open an issue on the standard's issue tracker.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With