Regular expression to detect semi-colon terminated C++ for & while loops

Tags:

In my Python application, I need to write a regular expression that matches a C++ for or while loop that has been terminated with a semi-colon (;). For example, it should match this:

for (int i = 0; i < 10; i++);

... but not this:

for (int i = 0; i < 10; i++)

This looks trivial at first glance, until you realise that the text between the opening and closing parenthesis may contain other parenthesis, for example:

for (int i = funcA(); i < funcB(); i++);

I'm using the python.re module. Right now my regular expression looks like this (I've left my comments in so you can understand it easier):

# match any line that begins with a "for" or "while" statement: ^\s*(for|while)\s* \(  # match the initial opening parenthesis     # Now make a named group 'balanced' which matches a balanced substring.     (?P<balanced>         # A balanced substring is either something that is not a parenthesis:         [^()]         | # …or a parenthesised string:         \( # A parenthesised string begins with an opening parenthesis             (?P=balanced)* # …followed by a sequence of balanced substrings         \) # …and ends with a closing parenthesis     )*  # Look for a sequence of balanced substrings \)  # Finally, the outer closing parenthesis. # must end with a semi-colon to match: \s*;\s*

This works perfectly for all the above cases, but it breaks as soon as you try and make the third part of the for loop contain a function, like so:

for (int i = 0; i < 10; doSomethingTo(i));

I think it breaks because as soon as you put some text between the opening and closing parenthesis, the "balanced" group matches that contained text, and thus the (?P=balanced) part doesn't work any more since it won't match (due to the fact that the text inside the parenthesis is different).

In my Python code I'm using the VERBOSE and MULTILINE flags, and creating the regular expression like so:

REGEX_STR = r"""# match any line that begins with a "for" or "while" statement: ^\s*(for|while)\s* \(  # match the initial opening parenthesis     # Now make a named group 'balanced' which matches     # a balanced substring.     (?P<balanced>         # A balanced substring is either something that is not a parenthesis:         [^()]         | # …or a parenthesised string:         \( # A parenthesised string begins with an opening parenthesis             (?P=balanced)* # …followed by a sequence of balanced substrings         \) # …and ends with a closing parenthesis     )*  # Look for a sequence of balanced substrings \)  # Finally, the outer closing parenthesis. # must end with a semi-colon to match: \s*;\s*"""  REGEX_OBJ = re.compile(REGEX_STR, re.MULTILINE| re.VERBOSE)

Can anyone suggest an improvement to this regular expression? It's getting too complicated for me to get my head around.

559

asked Feb 07 '09 20:02

Thomi

1 Answers

You could write a little, very simple routine that does it, without using a regular expression:

Set a position counter pos so that is points to just before the opening bracket after your for or while.
Set an open brackets counter openBr to 0.
Now keep incrementing pos, reading the characters at the respective positions, and increment openBr when you see an opening bracket, and decrement it when you see a closing bracket. That will increment it once at the beginning, for the first opening bracket in "for (", increment and decrement some more for some brackets in between, and set it back to 0 when your for bracket closes.
So, stop when openBr is 0 again.

The stopping positon is your closing bracket of for(...). Now you can check if there is a semicolon following or not.

answered Oct 03 '22 03:10

Frank

Related questions
                            
                                Cannot create constexpr std::vector
                            
                                Compute Median of Values Stored In Vector - C++?
                            
                                What does '**' mean in C?
                            
                                How do I remove trailing whitespace from a QString?
                            
                                Passing structs to functions
                            
                                C++ int float casting
                            
                                In either C or C++, should I check pointer parameters against NULL/nullptr?
                            
                                mysql.h file can't be found
                            
                                Optimize ternary operator
                            
                                Do polymorphism or conditionals promote better design?
                            
                                Print template typename at compile time
                            
                                Grayscale to Red-Green-Blue (MATLAB Jet) color scale
                            
                                Bit count : preprocessor magic vs modern C++
                            
                                Is "inline" implicit in C++ member functions defined in class definition
                            
                                How do I scale down numbers from rand()?
                            
                                When should I use __forceinline instead of inline?
                            
                                What does -> mean in C++? [duplicate]
                            
                                Find the smallest amongst 3 numbers in C++ [duplicate]
                            
                                How to explain undefined behavior to know-it-all newbies?
                            
                                C++ pointer assignment

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regular expression to detect semi-colon terminated C++ for & while loops

Tags:

c++

python

regex

parsing

recursion

Thomi

People also ask

1 Answers

Frank

Recent Activity

Donate For Us