Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to parse C/C++ functions declarations

I need to parse and split C and C++ functions into the main components (return type, function name/class and method, parameters, etc).

I'm working from either headers or a list where the signatures take the form:

public: void __thiscall myClass::method(int, class myOtherClass * )

I have the following regex, which works for most functions:

(?<expo>public\:|protected\:|private\:) (?<ret>(const )*(void|int|unsigned int|long|unsigned long|float|double|(class .*)|(enum .*))) (?<decl>__thiscall|__cdecl|__stdcall|__fastcall|__clrcall) (?<ns>.*)\:\:(?<class>(.*)((<.*>)*))\:\:(?<method>(.*)((<.*>)*))\((?<params>((.*(<.*>)?)(,)?)*)\)

There are a few functions that it doesn't like to parse, but appear to match the pattern. I'm not worried about matching functions that aren't members of a class at the moment (can handle that later). The expression is used in a C# program, so the <label>s are for easily retrieving the groups.

I'm wondering if there is a standard regex to parse all functions, or how to improve mine to handle the odd exceptions?

like image 482
ssube Avatar asked Aug 04 '10 19:08

ssube


1 Answers

C++ is notoriously hard to parse; it is impossible to write a regex that catches all cases. For example, there can be an unlimited number of nested parentheses, which shows that even this subset of the C++ language is not regular.

But it seems that you're going for practicality, not theoretical correctness. Just keep improving your regex until it catches the cases it needs to catch, and try to make it as stringent as possible so you don't get any false matches.

Without knowing the "odd exceptions" that it doesn't catch, it's hard to say how to improve the regex.

like image 112
Thomas Avatar answered Sep 22 '22 04:09

Thomas