G'day! How can I construct a simple ANTLR grammar handling multi-line expressions without the need for either semicolons or backslashes? I'm trying to write a simple DSLs for expressions: <pre class="prettyprint"><code># sh style comments ThisValue = 1 ThatValue = ThisValue * 2 ThisOtherValue = (1 + 2 + ThisValue * ThatValue) YetAnotherValue = MAX(ThisOtherValue, ThatValue) </code></pre> Overall, I want my application to provide the script with some initial named values and pull out the final result. I'm getting hung up on the syntax, however. I'd like to support multiple line expressions like the following: <pre class="prettyprint"><code># Note: no backslashes required to continue expression, as we're in brackets # Note: no semicolon required at end of expression, either ThisValueWithAReallyLongName = (ThisOtherValueWithASimilarlyLongName +AnotherValueWithAGratuitouslyLongName) </code></pre> I started off with an ANTLR grammar like this: <pre class="prettyprint"><code>exprlist : ( assignment_statement | empty_line )* EOF! ; assignment_statement : assignment NL!? ; empty_line : NL; assignment : ID '=' expr ; // ... and so on </code></pre> It seems simple, but I'm already in trouble with the newlines: <pre class="prettyprint"><code>warning(200): StackOverflowQuestion.g:11:20: Decision can match input such as "NL" using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input </code></pre> Graphically, in org.antlr.works.IDE: Decision Can Match NL Using Multiple Alternatives http://img.skitch.com/20090723-ghpss46833si9f9ebk48x28b82.png I've kicked the grammar around, but always end up with violations of expected behavior: <ul> <li>A newline is not required at the end of the file</li> <li>Empty lines are acceptable</li> <li>Everything in a line from a pound sign onward is discarded as a comment</li> <li>Assignments end with end-of-line, not semicolons</li> <li>Expressions can span multiple lines if wrapped in brackets</li> </ul> I can find example ANTLR grammars with many of these characteristics. I find that when I cut them down to limit their expressiveness to just what I need, I end up breaking something. Others are too simple, and I break them as I add expressiveness. Which angle should I take with this grammar? Can you point to any examples that aren't either trivial or full Turing-complete languages?

I would let your tokenizer do the heavy lifting rather than mixing your newline rules into your grammar: <ul> <li>Count parentheses, brackets, and braces, and don't generate NL tokens while there are unclosed groups. That'll give you line continuations for free without your grammar being any the wiser.</li> <li>Always generate an NL token at the end of file whether or not the last line ends with a <code>'\n'</code> character, then you don't have to worry about a special case of a statement without a NL. Statements always end with an NL.</li> </ul> The second point would let you simplify your grammar to something like this: <pre class="prettyprint"><code>exprlist : ( assignment_statement | empty_line )* EOF! ; assignment_statement : assignment NL ; empty_line : NL ; assignment : ID '=' expr ; </code></pre>

How can I construct a clean, Python like grammar in ANTLR?

Tags:

grammar

antlr

G'day!

How can I construct a simple ANTLR grammar handling multi-line expressions without the need for either semicolons or backslashes?

I'm trying to write a simple DSLs for expressions:

# sh style comments
ThisValue = 1
ThatValue = ThisValue * 2
ThisOtherValue = (1 + 2 + ThisValue * ThatValue)
YetAnotherValue = MAX(ThisOtherValue, ThatValue)

Overall, I want my application to provide the script with some initial named values and pull out the final result. I'm getting hung up on the syntax, however. I'd like to support multiple line expressions like the following:

# Note: no backslashes required to continue expression, as we're in brackets
# Note: no semicolon required at end of expression, either
ThisValueWithAReallyLongName = (ThisOtherValueWithASimilarlyLongName
                               +AnotherValueWithAGratuitouslyLongName)

I started off with an ANTLR grammar like this:

exprlist
    : ( assignment_statement | empty_line )* EOF!
    ;
assignment_statement
    : assignment NL!?
    ;
empty_line
    : NL;
assignment
    : ID '=' expr
    ;

// ... and so on

It seems simple, but I'm already in trouble with the newlines:

warning(200): StackOverflowQuestion.g:11:20: Decision can match input such as "NL" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input

Graphically, in org.antlr.works.IDE:

Decision Can Match NL Using Multiple Alternatives http://img.skitch.com/20090723-ghpss46833si9f9ebk48x28b82.png

I've kicked the grammar around, but always end up with violations of expected behavior:

A newline is not required at the end of the file
Empty lines are acceptable
Everything in a line from a pound sign onward is discarded as a comment
Assignments end with end-of-line, not semicolons
Expressions can span multiple lines if wrapped in brackets

I can find example ANTLR grammars with many of these characteristics. I find that when I cut them down to limit their expressiveness to just what I need, I end up breaking something. Others are too simple, and I break them as I add expressiveness.

Which angle should I take with this grammar? Can you point to any examples that aren't either trivial or full Turing-complete languages?

383

asked Jul 23 '09 03:07

Garth Kidd

1 Answers

I would let your tokenizer do the heavy lifting rather than mixing your newline rules into your grammar:

Count parentheses, brackets, and braces, and don't generate NL tokens while there are unclosed groups. That'll give you line continuations for free without your grammar being any the wiser.
Always generate an NL token at the end of file whether or not the last line ends with a '\n' character, then you don't have to worry about a special case of a statement without a NL. Statements always end with an NL.

The second point would let you simplify your grammar to something like this:

exprlist
    : ( assignment_statement | empty_line )* EOF!
    ;
assignment_statement
    : assignment NL
    ;
empty_line
    : NL
    ;
assignment
    : ID '=' expr
    ;

147

answered Nov 05 '22 12:11

John Kugelman

Related questions
                            
                                Can I use an Antlr created lexer/parser to parse PDDL file and return data to a Java program?
                            
                                Switching Antlr lexer modes from parser
                            
                                Lexer/parser tools [closed]
                            
                                Antlr4 C++ target
                            
                                Help with left factoring a grammar to remove left recursion
                            
                                ANTLR: call a rule from a different grammar
                            
                                Antlr4 - Implicit Definitions
                            
                                Python 2.7 & ANTLR4 : Make ANTLR throw exceptions on invalid input
                            
                                How to match any symbol in ANTLR parser (not lexer)?
                            
                                Visitor/Listener code for a while loop in ANTLR 4
                            
                                Is the ANTLR parser generator best for a C++ app with constrained memory?
                            
                                Which Java oriented lexer parser for simple project (ANTLR, DIY, etc)
                            
                                Explain How Jint Works
                            
                                Integrating ANTLR 4 in a C++ application
                            
                                Generate EBNF from ANTLR
                            
                                ANTLR : no viable alternative error
                            
                                Compiling ISO SQL-2003 ANTLR Grammar
                            
                                Can I remove ANTLR dependencies from generated code?
                            
                                Parser and Lexer Files Not Auto Generated by Eclipse
                            
                                How to implement the visitor pattern for nested function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With