I'm trying to parse string in the following format (EBNF, I hope this is right) in PHP: <pre class="prettyprint"><code><exp> ::= <base>[{<modifier>["!"]"("<exp>")"}] <base> ::= <role>[{<modifier><role>}] <modifier> ::= "&" | "|" <role> ::= ["!"]<str>[","<str>] </code></pre> Where <code><str></code> is any string that would pass <code>[a-zA-Z0-9\-]+</code> The following are example of patterns that would have to be parsed: <pre class="prettyprint"><code>token1 token1&token2 token1|(token2&!token3) (token1&token2)|(token3&(token4|(!token5,12&token6))) !(token1&token2|(token3&!token4))|token5,12 </code></pre> I am trying to write a RegEx pattern that would always give me four groups: <ol> <li>The left-most <code><expression></code>. From the above example this would be: <ul> <li><code>token1</code></li> <li><code>token1</code></li> <li><code>token1</code></li> <li><code>token1&token2</code></li> <li><code>token1&token2|(token3&!token4)</code></li> </ul> </li> <li>If <code>["!"]</code> was present. I.e. <ul> <li><code>null</code></li> <li><code>null</code></li> <li><code>null</code></li> <li><code>null</code></li> <li> <code>!</code> </li> </ul> </li> <li>The <code><modifier></code> for the next <code><expression></code> (if any). This would be: <ul> <li><code>null</code></li> <li><code>&</code></li> <li><code>|</code></li> <li><code>|</code></li> <li><code>|</code></li> </ul> </li> <li>The remaining of the pattern. <ul> <li><code>null</code></li> <li><code>token2</code></li> <li><code>token2&!token3</code></li> <li><code>token3&(token4|(!token5,12&token6))</code></li> <li><code>token5,12</code></li> </ul> </li> </ol> I can parse this provided that the first expression doesn't contain any <code><modifier></code>s. <pre class="prettyprint"><code>^$?(!?)([a-zA-Z0-9\-]+)$?([&|]?)(.*)$ </code></pre> I am stuck at this point. I have tried using lookarounds, however I can't figure out how to ensure that the group is captured when all brackets are balanced. Is this achievable with RegEx or do I need to write code using loops etc. to do this?

(PHP) Parsing RegEx string - balancing brackets

I'm trying to parse string in the following format (EBNF, I hope this is right) in PHP:

<exp>      ::= <base>[{<modifier>["!"]"("<exp>")"}]
<base>     ::= <role>[{<modifier><role>}]
<modifier> ::= "&" | "|"
<role>     ::= ["!"]<str>[","<str>]

Where <str> is any string that would pass [a-zA-Z0-9\-]+

The following are example of patterns that would have to be parsed:

token1
token1&token2
token1|(token2&!token3)
(token1&token2)|(token3&(token4|(!token5,12&token6)))
!(token1&token2|(token3&!token4))|token5,12

I am trying to write a RegEx pattern that would always give me four groups:

The left-most <expression>. From the above example this would be:
- token1
- token1
- token1
- token1&token2
- token1&token2|(token3&!token4)
If ["!"] was present. I.e.
- null
- null
- null
- null
- !
The <modifier> for the next <expression> (if any). This would be:
- null
- &
- |
- |
- |
The remaining of the pattern.
- null
- token2
- token2&!token3
- token3&(token4|(!token5,12&token6))
- token5,12

I can parse this provided that the first expression doesn't contain any <modifier>s.

^\(?(!?)([a-zA-Z0-9\-]+)\)?([&|]?)(.*)$

I am stuck at this point. I have tried using lookarounds, however I can't figure out how to ensure that the group is captured when all brackets are balanced. Is this achievable with RegEx or do I need to write code using loops etc. to do this?

As far as I know, it is impossible.

You have a context-free grammar (EBNF is for this type of grammars - Type-2 grammars), which cannot be parsed with regular expressions (which are for regular grammars - Type-3 grammars).

http://en.wikipedia.org/wiki/Chomsky_hierarchy

As an example of the thing you cannot handle here: number of opening parantheses - you can only write one regexp for each number of these (but there can be infinite, right?), otherwise there is no way to tell if the number of matching closing parantheses is the same. There is no way to count how many chars mathed by the specific part of regexp with quantifiers (+, *, etc.)

(PHP) Parsing RegEx string - balancing brackets

Tags:

regex

php

Bart Platak

1 Answers

scriptin

Recent Activity

Donate For Us

(PHP) Parsing RegEx string - balancing brackets

Tags:

regex

php

Bart Platak

1 Answers

scriptin

Related questions

Recent Activity

Donate For Us