Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression to Identify If Statements

Tags:

regex

I'm trying to write a regular expression to identify an if statement. The only problem I'm having is getting it capture if statements that have parentheses in their parenthesis. For example:

if (condition_function(params)) {
     statements;
}

My expression to capture all if statements except these is:

 if\s*\(([^\(\)]|\s)*\)\s*{(.|\s)*?}

Does anyone know how to write that?

like image 324
Koukaakiva Avatar asked Mar 16 '09 17:03

Koukaakiva


People also ask

How do you do if statements in regex?

If the if part evaluates to true, then the regex engine will attempt to match the then part. Otherwise, the else part is attempted instead. The syntax consists of a pair of parentheses. The opening bracket must be followed by a question mark, immediately followed by the if part, immediately followed by the then part.

What is the regular expression for identifier?

identifier = letter (letter | digit)* real-numeral = digit digit* .

What is ?= * In regular expression?

. Your regex starts with (?= (ensure that you can see, but don't consume) followed by . * (zero or more of any character).


2 Answers

That is not possible with regular expressions since regular expressions can only match regular languages and the one you are trying to parse is context-free and not regular (thanks to dirkgently and dmckee).

Have a look at WP: Formal language theory is you are interested...

Btw. You can't even check an expression only made of parentheses if it's correct ( [[][]] is correct but []][ is not) which is a "subproblem" of the one you gave above.

like image 189
Johannes Weiss Avatar answered Nov 24 '22 01:11

Johannes Weiss


I think this may work. If anyone sees something I don't, like a reason it won't work, please respond.

if\s*\(((?:[^\(\)]|\((?1)\))*+)\)\s*{((?:[^{}]|{(?2)})*+)}

The only problem this should encounter now is if there is an if statement in an if statement.

I've tested this on every valid if statement that I can think of that might break it and the only thing that it does not work on is one that contains a string with an unmatched parenthesis.

Update: I found an error with the above regular expression. It does not catch if statements that contains strings with unmatched parenthesis in their condition or statement sections. Like the following example:

if (var1 == "("){
    echo "{";
}

However this is a valid if statement. The solution:

if\s*\(((?:(?:(?:"(?:(?:\\")|[^"])*")|(?:'(?:(?:\\')|[^'])*'))|[^\(\)]|\((?1)\))*+)\)\s*{((?:(?:(?:"(?:(?:\\")|[^"])*")|(?:'(?:(?:\\')|[^'])*'))|[^{}]|{(?2)})*+)}\s*

This regular expression captures all if statements even ones that contain strings with unmatched parenthesis.

UPDATE: I now have it so that is captures the else and else if statements that are attached to if statements. The only problem is that the capture groups it returns are the last else and the last else if in the if statement. Hopefully I'll figure out how to get around that as well.

if\s*\(((?:(?:(?:"(?:(?:\\")|[^"])*")|(?:'(?:(?:\\')|[^'])*'))|[^\(\)]|\((?1)\))*+)\)\s*{((?:(?:(?:"(?:(?:\\")|[^"])*")|(?:'(?:(?:\\')|[^'])*'))|[^{}]|{(?2)})*+)}\s*(?:(?:else\s*{((?:(?:(?:"(?:(?:\\")|[^"])*")|(?:'(?:(?:\\')|[^'])*'))|[^{}]|{(?3)})*+)}\s*)|(?:else\s*if\s*\(((?:(?:(?:"(?:(?:\\")|[^"])*")|(?:'(?:(?:\\')|[^'])*'))|[^\(\)]|\((?4)\))*+)\)\s*{((?:(?:(?:"(?:(?:\\")|[^"])*")|(?:'(?:(?:\\')|[^'])*'))|[^{}]|{(?5)})*+)}\s*))*;

If you want to test it out, here's a great website for it: http://gskinner.com/RegExr/

like image 35
Koukaakiva Avatar answered Nov 24 '22 00:11

Koukaakiva