Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I tell an array from a procedure call?

Tags:

parsing

antlr4

Context

I'm parsing vba code, where...

  • This code outputs the contents of the first dimension of array a at index i:

    Debug.Print a(i, 1)
    
  • This code outputs the result of function a given parameters i and 1:

    Debug.Print a(i, 1)
    
  • This code calls procedure DoSomething while evaluating foo as a value and passing it by value to the procedure (regardless of whether the signature has it as a "by reference" parameter):

    DoSomething (foo)
    
  • This code calls procedure DoSomething without evaluating foo as a value, and passing it by reference if the signature takes the parameter "by reference":

    Call DoSomething(foo)
    

So I have this lExpression parser rule that's problematic, because the first alternative (#indexExpr) matches both the array and the procedure call:

lExpression :
    lExpression whiteSpace? LPAREN whiteSpace? argumentList? whiteSpace? RPAREN                                     # indexExpr
    | lExpression mandatoryLineContinuation? DOT mandatoryLineContinuation? unrestrictedIdentifier                  # memberAccessExpr
    | lExpression mandatoryLineContinuation? EXCLAMATIONPOINT mandatoryLineContinuation? unrestrictedIdentifier     # dictionaryAccessExpr
    | ME                                                                                                            # instanceExpr
    | identifier                                                                                                    # simpleNameExpr
    | DOT mandatoryLineContinuation? unrestrictedIdentifier                                                         # withMemberAccessExpr
    | EXCLAMATIONPOINT mandatoryLineContinuation? unrestrictedIdentifier                                            # withDictionaryAccessExpr
;

The problem

The specific issue I'm trying to fix here, is best depicted by the stack trace I'm getting out of the parse exception that's thrown with this code:

Sub Test()
    DoSomething (foo), bar
End Sub

failing test stack trace

I can see the callStmt() rule kicking in as it should, but then the expression that's meant to match DoSomething is matching a #lExpr that captures what should be the "argument list", but instead gets picked up as an array index.

Everything I've tried, from moving the #parenthesizedExpr up to a higher priority than #lExpr, to making a memberExpression rule and use that instead of expression in the callStmt rule, has failed (project builds, but I end up with 1500 failing tests because nothing parses anymore).

The reason #lExpr matches DoSomething (foo) is specifically because, well, it's perfectly legal to have an indexExpr there - it's as if I needed some way to ignore a rule in the parsing, but only when I know that there's a callStmt in the lineage.

Is it even possible to disambiguate a(i, 1) (the array call) from a(i, 1) (the function call)?

If so... how?


Additional context

Here's the expression rule from which the lExpression rule is called:

expression :
    // Literal Expression has to come before lExpression, otherwise it'll be classified as simple name expression instead.
    literalExpression                                                                               # literalExpr
    | lExpression                                                                                   # lExpr
    | builtInType                                                                                   # builtInTypeExpr
    | LPAREN whiteSpace? expression whiteSpace? RPAREN                                              # parenthesizedExpr
    | TYPEOF whiteSpace expression                                                                  # typeofexpr        // To make the grammar SLL, the type-of-is-expression is actually the child of an IS relational op.
    | NEW whiteSpace expression                                                                     # newExpr
    | expression whiteSpace? POW whiteSpace? expression                                             # powOp
    | MINUS whiteSpace? expression                                                                  # unaryMinusOp
    | expression whiteSpace? (MULT | DIV) whiteSpace? expression                                    # multOp
    | expression whiteSpace? INTDIV whiteSpace? expression                                          # intDivOp
    | expression whiteSpace? MOD whiteSpace? expression                                             # modOp
    | expression whiteSpace? (PLUS | MINUS) whiteSpace? expression                                  # addOp
    | expression whiteSpace? AMPERSAND whiteSpace? expression                                       # concatOp
    | expression whiteSpace? (EQ | NEQ | LT | GT | LEQ | GEQ | LIKE | IS) whiteSpace? expression    # relationalOp
    | NOT whiteSpace? expression                                                                    # logicalNotOp
    | expression whiteSpace? AND whiteSpace? expression                                             # logicalAndOp
    | expression whiteSpace? OR whiteSpace? expression                                              # logicalOrOp
    | expression whiteSpace? XOR whiteSpace? expression                                             # logicalXorOp
    | expression whiteSpace? EQV whiteSpace? expression                                             # logicalEqvOp
    | expression whiteSpace? IMP whiteSpace? expression                                             # logicalImpOp
    | HASH expression                                                                               # markedFileNumberExpr // Added to support special forms such as Input(file1, #file1)
;

And the callStmt rule, which means to only pick up procedure calls (which may or may not be preceded by a Call keyword):

callStmt :
    CALL whiteSpace expression
    | expression (whiteSpace argumentList)?
;
like image 618
Mathieu Guindon Avatar asked Nov 14 '16 06:11

Mathieu Guindon


People also ask

Can we declare array in stored procedure?

SQL procedures support parameters and variables of array types. Arrays are a convenient way of passing transient collections of data between an application and a stored procedure or between two stored procedures. Within SQL stored procedures, arrays can be manipulated as arrays in conventional programming languages.

What is array as parameter?

When you need an indefinite number of arguments, you can declare a parameter array, which allows a procedure to accept an array of values for a parameter. You do not have to know the number of elements in the parameter array when you define the procedure.

How to pass array in vb net?

The parameter array must be passed by value and we should specify the ByVal keyword. The code within the procedure must use the parameter array as a one-dimensional array. In addition, each element of the array must be of the same data type as the data type of ParamArray. The parameter array is optional.


2 Answers

(I've built VB6/VBA parsers).

No, you can't distinguish at parse time, precisely because the syntax for a function call and an array access are identical, using a pure context-free parsing engine.

The simple thing to do is to simply parse the construct as array_access_or_function_call, and disambiguate which it is, after parsing by postprocessing the tree, discovering the declaration of the entity (e.g. building a symbol table) whose scope contains the reference (consulting the symbol table), and using that to decide.

This problem isn't unique to VB; C and C++ famously have a similar problem. The solution used in most C/C++ parsers is to have the parser collect declaration information as a side effect as it parses, and then consult that information when it encounters the instance syntax to decide.
This approach changes the parser into a context-sensitive one. The downside is that it tangles (at least partial) symbol table building with parsing, and your parsing engine may or may not cooperate making this more or less awkward to implement.

(I think ANTLR will let you call arbitrary code at various points in the parsing process which can be used to save declaration information, and ANTLR will let you call parse-time predicates to help guide the parser; these should be enough].

I prefer the parse-then-resolve approach because it is cleaner and more maintainable.

like image 50
Ira Baxter Avatar answered Oct 31 '22 19:10

Ira Baxter


You can't tell an array from a procedure call. Even at resolution time, you still can't necessarily know, as the sub-type of the variable might change as late as run-time.

This example shows the impact of default members that accept optional arguments

  Dim var As Variant

  Set var = Range("A1:B2")
  Debug.Print var(1, 1)     'Access the _Default/Item property with indice arguments

  var = var                 'Accesses the _Default/Item property without arguments
  Debug.Print var(1, 1)     'Array indices

You can't even reliably tell if the result of a procedure is a procedure call or an array index:

  Dim var1 As Variant
  Set var1 = New Dictionary
  Dim var2 As Variant
  Set var2 = New Dictionary
  var2.Add 0, "Foo"
  var1.Add 0, var2
  Debug.Print var1(0)(0)    'Accesses the default/Item of the default/Item

  var1 = Array(Array(1))
  Debug.Print var1(0)(0)    'Accesses the first index of the first index

You'll need to treat parenthesized blocks that follow a variable name as possibly belonging to a procedure or an array. In fact, it might even be useful to think of accessing an array member as if it has a default Item member. That way, an array is no different to an object with a default member that requires arguments that happen to be indices (and happens to have dedicated constructor syntaxes).

like image 25
ThunderFrame Avatar answered Oct 31 '22 18:10

ThunderFrame