Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing of optionals with PEG (Grako) falling short?

My colleague PaulS asked me the following:


I'm writing a parser for an existing language (SystemVerilog - an IEEE standard), and the specification has a rule in it that is similar in structure to this:

cover_point 
    = 
    [[data_type] identifier ':' ] 'coverpoint' identifier ';' 
    ;

data_type 
    = 
    'int' | 'float' | identifier 
    ;

identifier 
    = 
    ?/\w+/? 
    ;

The problem is that when parsing the following legal string:

anIdentifier: coverpoint another_identifier;

anIdentifier matches with data_type (via its identifier option) successfully, which means Grako is looking for another identifier after it and then fails. It doesn't then try to parse without the data_type part.

I can re-write the rule as follows,

cover_point_rewrite  
    = 
    [data_type identifier ':' | identifier ':' ] 'coverpoint' identifier ';' 
    ;

but I wonder if:

  1. this is intentional and
  2. if there's a better syntax?

Is this a PEG-in-general issue, or a tool (Grako) one?

like image 403
Apalala Avatar asked Oct 31 '22 20:10

Apalala


1 Answers

It says here that in PEGs the choice operator is ordered to avoid CFGs ambiguities by using the first match.

In your first example

[data_type]
succeeds parsing id, so it fails when it finds : instead of another identifier. That may be because [data_type] behaves like (data_type | ε) so it will always parse data_type with the first id.

In

[data_type identifier ':' | identifier ':' ]
the first choice fails when there is no second id, so the parser backtracks and tries with the second choice.
like image 131
1010 Avatar answered Nov 15 '22 06:11

1010