Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Posix regular expression non capturing group

Tags:

c

regex

linux

posix

I'm writing a simple shell in C under linux. I'm trying to parse user input with POSIX regex with group capturing. My problem is I dont want to capture all the groups, but the ?: symbol desnt seem to work for me.

"^(?:[A-Za-z0-9]+)( [A-Za-z0-9]*(?:\"[^\"]*\")*(?:\'[^\']*\')*[A-Za-z0-9]*)*&?$"
like image 221
luki180 Avatar asked Nov 26 '16 07:11

luki180


2 Answers

The use of (?:..), or any other grouping prefix, is not allowed in POSIX Regular Expressions.

There are tools to make languages, lex & yacc for example, and a simplified yacc grammar for POSIX shells is provided by the standard.

like image 134
kdhp Avatar answered Oct 17 '22 07:10

kdhp


The character sequence (? is undefined as per section 9.4.3 ERE Special Characters:

*+?{

The <asterisk>, <plus-sign>, <question-mark>, and <left-brace> shall be special except when used in a bracket expression (see RE Bracket Expression). Any of the following uses produce undefined results:

  • If these characters appear first in an ERE, or immediately following an unescaped <vertical-line>, <circumflex>, <dollar-sign>, or <left-parenthesis>

  • If a <left-brace> is not part of a valid interval expression (see EREs Matching Multiple Characters)

A POSIX RE implementation has a few choices for how to handle undefined syntax. Those choices include enabling an extended syntax as per section 9.1 Regular Expression Definitions. So it's free to implement the non-capturing group syntax:

[...] violations of the specified syntax or semantics for REs produce undefined results: this may entail an error, enabling an extended syntax for that RE, or using the construct in error as literal characters to be matched.

If you'd like to see the feature as part of a future POSIX standard, you could open an issue on the standard's issue tracker.

like image 30
nebel Avatar answered Oct 17 '22 07:10

nebel