Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing signatures with regex, having "fun" with array return values

I have this [nasty] regex to capture a VBA procedure signature with all the parts in a bucket:

    public static string ProcedureSyntax
    {
        get
        {
            return
                @"(?:(?<accessibility>Friend|Private|Public)\s)?(?:(?<kind>Sub|Function|Property\s(Get|Let|Set)))\s(?<identifier>(?:[a-zA-Z][a-zA-Z0-9_]*)|(?:\[[a-zA-Z0-9_]*\]))\((?<parameters>.*)?\)(?:\sAs\s(?<reference>(((?<library>[a-zA-Z][a-zA-Z0-9_]*))\.)?(?<identifier>([a-zA-Z][a-zA-Z0-9_]*)|\[[a-zA-Z0-9_]*\]))(?<array>\((?<size>(([0-9]+)\,?\s?)*|([0-9]+\sTo\s[0-9]+\,?\s?)+)\))?)?";
        }
    }

Part of it is overkill and will match illegal array syntaxes (in the context of a procedure's signature), but that's not my concern right now.

The problem is that this part:

\((?<parameters>.*)?\)

breaks when a function (or property getter) returns an array, because then the signature will look something like this:

Public Function GetSomeArray() As Variant()

Or like this:

Public Function GetSomeArray(ByVal foo As Integer) As Variant()

And that makes the function's return type completely borked, because the parameters capture group will pick up this:

ByVal foo As Integer) As Variant(

I know why it's happening - because my regex is assuming the last closing brace is the one delimiting the parameters capture group.

Is there a way to fix my regex to change that, without impacting performance too much?

The catch is that this is a valid signature:

Public Function DoSomething(foo As Integer, ParamArray bar()) As Variant()

I have another separate regex to handle individual parameters, and it would work great... if this one didn't get confused with array return types.

This is what I'm getting:

enter image description here

What I need, is a parameters group that doesn't include the ) As Variant( part, like it does when the return type isn't an array:

enter image description here

like image 380
Mathieu Guindon Avatar asked Dec 15 '14 05:12

Mathieu Guindon


1 Answers

Here you go....

(?:(?<accessibility>Friend|Private|Public)\s)?(?:(?<kind>Sub|Function|Property\s(Get|Let|Set)))\s(?<identifier>(?:[a-zA-Z][a-zA-Z0-9_]*)|(?:\[[a-zA-Z0-9_]*\]))\((?<parameters>(?:\(\)|[^()])*)?\)(?:\sAs\s(?<reference>(((?<library>[a-zA-Z][a-zA-Z0-9_]*))\.)?(?<identifier1>([a-zA-Z][a-zA-Z0-9_]*)|\[[a-zA-Z0-9_]*\]))(?<array>\((?<size>(([0-9]+)\,?\s?)*|([0-9]+\sTo\s[0-9]+\,?\s?)+)\))?)?

DEMO

What are the changes made in your original regex?

I just changed this \((?<parameters>.*)?\) part in your original regex to \((?<parameters>(?:\(\)|[^()])*)?\) . That is, .* in your pattern will do a greedy match upto the last ) symbol, but this (?:\(\)|[^()])* matches () part or any character not of ( or ) zero or more times. so this matches the strings like foo or foo()bar ..

like image 54
Avinash Raj Avatar answered Oct 17 '22 04:10

Avinash Raj