The input is a string representing a list of elements.
A list is defined as an open curly {
followed by 0 or more elements separated by whitespace followed by a closed curly }
.
An element is either a literal or a list of elements.
A literal is a succession of non-whitespace characters. If an element contains a curly bracket, it must be escaped with a backslash : \{
and \}
. (Or you could assume curlies are not allowed inside literals, for simplicity)
Example:
"{abc { def ghi } 7 { 1 {2} {3 4} } {5 6} x\{yz \}foo }"
No curlies inside literals:
"{abc { def ghi } 7 { 1 {2} {3 4} } {5 6} xyz foo }"
(This is a simplified definition of a Tcl list.)
What I want to know is: can the input be split into the elements of the outermost loop using regex?
Expected output:
abc
{ def ghi }
7
{ 1 {2} {3 4} }
{5 6}
x{yz
}foo
The real question is: can this be done with a Regex?
I'm most interested in the .NET flavour, but will accept any answers.
I'll post my own assumption in an answer, and see if it's validated or destroyed.
Unfortunately the answer is YES for some flavor of Regex, e.g. PCRE and .NET because they support recursive pattern and stack-like operations respectively.
The grammar can be written as
ELEMENT -> (?!\{)\S+ | LIST
LIST -> '\{\s*' ELEMENT? ('\s+' ELEMENT)* '\s*\}'
thus in PCRE, this can be transformed into the pattern:
\{\s*(?0)?(?:\s+(?0))*\s*\}|(?!\{)(?:[^\s}]|\}(?![\s}]))+
# --------------------------- ^^^^^^^^^
# LIST Make sure the } is not closing the group
See http://www.ideone.com/SnGsU for example (I have stripped the top-level {
and }
for simplicity).
(Of course, don't try this at work :) )
(BTW, I don't know how to transform this PCRE into .NET flavor. If someone knows, please try Converting PCRE recursive regex pattern to .NET balancing groups definition)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With