Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ANTLR4.7: rule XXX contains a closure with at least one alternative that can match an empty string'

I am trying to create a grammar to match content like below:

(For a simple grammar to repro this issue please see ADD 1)

[Defines]
  INF_VERSION                    = 0x00010005
  BASE_NAME                      = WebServer
  FILE_GUID                      = 99E87DCF-6162-40c5-9FA1-32111F5197F7
  MODULE_TYPE                    = SEC
  UEFI_SPECIFICATION_VERSION     = 0x00010005

The UEFI_SPECIFICATION_VERSION = 0x00010005 part is optional.

(for brevity, I omitted some of the grammar).

My grammar 1 looks like this:

defines : '[Defines]'
         define_statement+
         ;

define_statement  : 'INF_VERSION' EQ SpecVersion_VersionVal 
                  | 'BASE_NAME' EQ BaseName
                  | 'FILE_GUID' EQ RegistryFormatGUID
                  | 'MODULE_TYPE' EQ Edk2ModuleType
                  | ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)?
                  ;

ANTLR 4.7 reports this error:

message: 'rule defines contains a closure with at least one alternative that can match an empty string'

But if I changed grammar like this:

defines : '[Defines]'
         define_statement+
         | ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)? // <<< HERE
         ;

define_statement  : 'INF_VERSION' EQ SpecVersion_VersionVal
                  | 'BASE_NAME' EQ BaseName
                  | 'FILE_GUID' EQ RegistryFormatGUID
                  | 'MODULE_TYPE' EQ Edk2ModuleType

The error is gone.

My question is, what does the closure mean? Which part is the closure? The define_statement?

After I move the potentially empty alternative, the defines rule can alternate between '[Defines]' define_statement+ and ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)?, which means defines can still match empty string. How could the error be gone?

ADD 1

To make things more clear, I repro this error with a simplified grammar:

grammar test;

rule : alternate+; // <<<<< HERE
alternate : '1'?;

If I use + or * at HERE, ANTLR will report an error:

'rule rule contains a closure with at least one alternative that can match an empty string'

If I use ? at HERE, ANTLR will report a warning:

'rule rule contains an optional block with at least one alternative that can match an empty string'

I am still not sure why.

ADD 2

Each of the alternate WILL be a child node of rule, so if alternate can be empty string, then it is logically possible to lead to endless child nodes for rule. So I guess this may explain why ANTLR forbids me to do that with alternate+ or alternate*. But if it is with alternate?, at most there will be one child node. It's only a performance issue. So ANTLR just generate a warning.

like image 405
smwikipedia Avatar asked Jul 31 '17 08:07

smwikipedia


2 Answers

Let's start with the warning. The application is merely alerting you that something can be matched by the empty string. This is a warning because most of the time, you don't want tokens to match to the empty string.

defines : '[Defines]'
         define_statement+
         ;

define_statement  : 'INF_VERSION' EQ SpecVersion_VersionVal 
                  | 'BASE_NAME' EQ BaseName
                  | 'FILE_GUID' EQ RegistryFormatGUID
                  | 'MODULE_TYPE' EQ Edk2ModuleType
                  | ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)?
                  ;

Since ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal) is optional (it is followed by ?, it could be replace by nothing, like this:

define_statement  : 'INF_VERSION' EQ SpecVersion_VersionVal 
                  | 'BASE_NAME' EQ BaseName
                  | 'FILE_GUID' EQ RegistryFormatGUID
                  | 'MODULE_TYPE' EQ Edk2ModuleType
                  | 
                  ;

That last | by itself means the rule can match nothing, or the empty string. So the mystery about the warning is solved. They call it a closure, but you could think of it as a "token binding" or "match." I don't think the terminology is all that important in a practical sense.

The error goes away if you remove the alternative, because then, again rewriting for clarity, we have:

define_statement  : 'INF_VERSION' EQ SpecVersion_VersionVal 
                  | 'BASE_NAME' EQ BaseName
                  | 'FILE_GUID' EQ RegistryFormatGUID
                  | 'MODULE_TYPE' EQ Edk2ModuleType
                  ;

And there's nothing optional there. One of those has to match.

You've already mentioned in your comments that you understand why moving the rule to its own rule -- that can potentially match an infinite number of empty strings -- is a bad, idea, so I won't belabor that here.

But why did the error go away when you did that? Because

defines : '[Defines]'
         define_statement+
         | ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)? // <<< HERE
         ;

is guaranteed to match something, even if it's only the token [Defines] , which is an implicit lexer token. So even if the UEFI thing is empty string, there's still something to parse. That wasn't true in the first version we examined; indeed the whole define_statement rule there could have been an empty string. That's quite a difference from a parsing standpoint.

Now the big question: Is the [Defines] section truly optional, or not? Only you can answer that. But if it is, perhaps you should just recode it as:

defines : ('[Defines]' define_statement+)?

define_statement  : 'INF_VERSION' EQ SpecVersion_VersionVal 
                  | 'BASE_NAME' EQ BaseName
                  | 'FILE_GUID' EQ RegistryFormatGUID
                  | 'MODULE_TYPE' EQ Edk2ModuleType
                  | 'UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal

This makes it completely optional. Again, only you can decide if this is valid for your grammar and expected input.

Make sense? I hope I helped you!

EDIT 1

To relieve the error, try this grammar (I made explicit tokens for the test values to get it to run):

grammar Uefi;
defines : '[Defines]' statement+ ;
statement : define_statement | uefi_statement ;      
uefi_statement : 'UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal ;
define_statement  : 'INF_VERSION' EQ SpecVersion_VersionVal 
                  | 'BASE_NAME' EQ BaseName
                  | 'FILE_GUID' EQ RegistryFormatGUID
                  | 'MODULE_TYPE' EQ Edk2ModuleType
                  ;
// DUMMY VALUES               
SpecVersion_VersionVal : '0x00010005';
BaseName : 'WebServer';
RegistryFormatGUID : '99E87DCF-6162-40c5-9FA1-32111F5197F7';
Edk2ModuleType : 'SEC';
EQ : '=';
WS : [ \t\r\n]+ -> skip;

enter image description here

like image 119
TomServo Avatar answered Sep 24 '22 13:09

TomServo


Just add my solution. Credit goes to @JLH.

The important thing to get is to have 2 separate rules for lines with different natures.

  • linesGroup1_Defines
  • linesGroup2_Defines.

This way, the optional nature of a line can be reached through | (choice) instead of ?(optional).

grammar inf;

start : configSections;

configSections: configSection+
                EOF;

configSection: section_Defines
             | bSection
             ;

section_Defines : '[Defines]'
                 sectionLine_Defines*;

sectionLine_Defines  : linesGroup1_Defines | linesGroup2_Defines;

linesGroup1_Defines : 'INF_VERSION' EQ SpecVersion_VersionVal 
           | 'BASE_NAME' EQ BaseName
           | 'FILE_GUID' EQ RegistryFormatGUID
           | 'MODULE_TYPE' EQ Edk2ModuleType
           ;
linesGroup2_Defines : 'UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal
              | 'PI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal
              ;


bSection : '[b]'
           SectionLine_b+;

(Some necessary token definitions are omitted for brevity)

enter image description here

ADD 1

On a second thought, with the above solution, I didn't cover the semantic that linesGroup1_Deines is mandatory and linsGroup2_Defines is optional. Actually both are optional now. It can accept input with only optional lines like below:

[Defines]
  UEFI_SPECIFICATION_VERSION     = 0x00010005

I am not sure if this semantic can/should be covered in the grammar. Maybe I need to further refine it.

like image 27
smwikipedia Avatar answered Sep 25 '22 13:09

smwikipedia