I am trying to create a grammar to match content like below:
(For a simple grammar to repro this issue please see ADD 1)
[Defines]
INF_VERSION = 0x00010005
BASE_NAME = WebServer
FILE_GUID = 99E87DCF-6162-40c5-9FA1-32111F5197F7
MODULE_TYPE = SEC
UEFI_SPECIFICATION_VERSION = 0x00010005
The UEFI_SPECIFICATION_VERSION = 0x00010005
part is optional.
(for brevity, I omitted some of the grammar).
My grammar 1 looks like this:
defines : '[Defines]'
define_statement+
;
define_statement : 'INF_VERSION' EQ SpecVersion_VersionVal
| 'BASE_NAME' EQ BaseName
| 'FILE_GUID' EQ RegistryFormatGUID
| 'MODULE_TYPE' EQ Edk2ModuleType
| ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)?
;
ANTLR 4.7 reports this error:
message: 'rule defines contains a closure with at least one alternative that can match an empty string'
But if I changed grammar like this:
defines : '[Defines]'
define_statement+
| ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)? // <<< HERE
;
define_statement : 'INF_VERSION' EQ SpecVersion_VersionVal
| 'BASE_NAME' EQ BaseName
| 'FILE_GUID' EQ RegistryFormatGUID
| 'MODULE_TYPE' EQ Edk2ModuleType
The error is gone.
My question is, what does the closure
mean? Which part is the closure
? The define_statement
?
After I move the potentially empty alternative, the defines
rule can alternate between '[Defines]' define_statement+
and ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)?
, which means defines
can still match empty string. How could the error be gone?
To make things more clear, I repro this error with a simplified grammar:
grammar test;
rule : alternate+; // <<<<< HERE
alternate : '1'?;
If I use +
or *
at HERE
, ANTLR will report an error:
'rule rule contains a closure with at least one alternative that can match an empty string'
If I use ?
at HERE
, ANTLR will report a warning:
'rule rule contains an optional block with at least one alternative that can match an empty string'
I am still not sure why.
Each of the alternate
WILL be a child node of rule
, so if alternate
can be empty string, then it is logically possible to lead to endless child nodes for rule
. So I guess this may explain why ANTLR forbids me to do that with alternate+
or alternate*
. But if it is with alternate?
, at most there will be one child node. It's only a performance issue. So ANTLR just generate a warning.
Let's start with the warning. The application is merely alerting you that something can be matched by the empty string. This is a warning because most of the time, you don't want tokens to match to the empty string.
defines : '[Defines]'
define_statement+
;
define_statement : 'INF_VERSION' EQ SpecVersion_VersionVal
| 'BASE_NAME' EQ BaseName
| 'FILE_GUID' EQ RegistryFormatGUID
| 'MODULE_TYPE' EQ Edk2ModuleType
| ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)?
;
Since ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal) is optional (it is followed by ?
, it could be replace by nothing, like this:
define_statement : 'INF_VERSION' EQ SpecVersion_VersionVal
| 'BASE_NAME' EQ BaseName
| 'FILE_GUID' EQ RegistryFormatGUID
| 'MODULE_TYPE' EQ Edk2ModuleType
|
;
That last |
by itself means the rule can match nothing, or the empty string. So the mystery about the warning is solved. They call it a closure, but you could think of it as a "token binding" or "match." I don't think the terminology is all that important in a practical sense.
The error goes away if you remove the alternative, because then, again rewriting for clarity, we have:
define_statement : 'INF_VERSION' EQ SpecVersion_VersionVal
| 'BASE_NAME' EQ BaseName
| 'FILE_GUID' EQ RegistryFormatGUID
| 'MODULE_TYPE' EQ Edk2ModuleType
;
And there's nothing optional there. One of those has to match.
You've already mentioned in your comments that you understand why moving the rule to its own rule -- that can potentially match an infinite number of empty strings -- is a bad, idea, so I won't belabor that here.
But why did the error go away when you did that? Because
defines : '[Defines]'
define_statement+
| ('UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal)? // <<< HERE
;
is guaranteed to match something, even if it's only the token [Defines]
, which is an implicit lexer token. So even if the UEFI thing is empty string, there's still something to parse. That wasn't true in the first version we examined; indeed the whole define_statement
rule there could have been an empty string. That's quite a difference from a parsing standpoint.
Now the big question: Is the [Defines]
section truly optional, or not? Only you can answer that. But if it is, perhaps you should just recode it as:
defines : ('[Defines]' define_statement+)?
define_statement : 'INF_VERSION' EQ SpecVersion_VersionVal
| 'BASE_NAME' EQ BaseName
| 'FILE_GUID' EQ RegistryFormatGUID
| 'MODULE_TYPE' EQ Edk2ModuleType
| 'UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal
This makes it completely optional. Again, only you can decide if this is valid for your grammar and expected input.
Make sense? I hope I helped you!
To relieve the error, try this grammar (I made explicit tokens for the test values to get it to run):
grammar Uefi;
defines : '[Defines]' statement+ ;
statement : define_statement | uefi_statement ;
uefi_statement : 'UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal ;
define_statement : 'INF_VERSION' EQ SpecVersion_VersionVal
| 'BASE_NAME' EQ BaseName
| 'FILE_GUID' EQ RegistryFormatGUID
| 'MODULE_TYPE' EQ Edk2ModuleType
;
// DUMMY VALUES
SpecVersion_VersionVal : '0x00010005';
BaseName : 'WebServer';
RegistryFormatGUID : '99E87DCF-6162-40c5-9FA1-32111F5197F7';
Edk2ModuleType : 'SEC';
EQ : '=';
WS : [ \t\r\n]+ -> skip;
Just add my solution. Credit goes to @JLH.
The important thing to get is to have 2 separate rules for lines with different natures.
linesGroup1_Defines
linesGroup2_Defines
.This way, the optional nature
of a line can be reached through |
(choice) instead of ?
(optional).
grammar inf;
start : configSections;
configSections: configSection+
EOF;
configSection: section_Defines
| bSection
;
section_Defines : '[Defines]'
sectionLine_Defines*;
sectionLine_Defines : linesGroup1_Defines | linesGroup2_Defines;
linesGroup1_Defines : 'INF_VERSION' EQ SpecVersion_VersionVal
| 'BASE_NAME' EQ BaseName
| 'FILE_GUID' EQ RegistryFormatGUID
| 'MODULE_TYPE' EQ Edk2ModuleType
;
linesGroup2_Defines : 'UEFI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal
| 'PI_SPECIFICATION_VERSION' EQ SpecVersion_VersionVal
;
bSection : '[b]'
SectionLine_b+;
(Some necessary token definitions are omitted for brevity)
On a second thought, with the above solution, I didn't cover the semantic that linesGroup1_Deines
is mandatory and linsGroup2_Defines
is optional. Actually both are optional now. It can accept input with only optional lines like below:
[Defines]
UEFI_SPECIFICATION_VERSION = 0x00010005
I am not sure if this semantic can/should be covered in the grammar. Maybe I need to further refine it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With