I understand the theory behind separating parser rules and lexer rules in theory, but what are the practical differences between these two statements in ANTLR: <pre class="prettyprint"><code>my_rule: ... ; MY_RULE: ... ; </code></pre> Do they result in different AST trees? Different performance? Potential ambiguities?

This post may be helpful: <blockquote> The lexer is responsible for the first step, and it's only job is to create a "token stream" from text. It is not responsible for understanding the semantics of your language, it is only interested in understanding the syntax of your language. For example, syntax is the rule that an identifier must only use characters, numbers and underscores - as long as it doesn't start with a number. The responsibility of the lexer is to understand this rule. In this case, the lexer would accept the sequence of characters "asd_123" but reject the characters "12dsadsa" (assuming that there isn't another rule in which this text is valid). When seeing the valid text example, it may emit a token into the token stream such as IDENTIFIER(asd_123). Note that I said "identifier" which is the general term for things like variable names, function names, namespace names, etc. The parser would be the thing that would understand the context in which that identifier appears, so that it would then further specify that token as being a certain thing's name. (sidenote: the token is just a unique name given to an element of the token stream. The lexeme is the text that the token was matched from. I write the lexeme in parentheses next to the token. For example, NUMBER(123). In this case, this is a NUMBER token with a lexeme of '123'. However, with some tokens, such as operators, I omit the lexeme since it's redundant. For example, I would write SEMICOLON for the semicolon token, not SEMICOLON( ; )). </blockquote> From ANTLR - When to use Parser Rules vs Lexer Rules?

Practical difference between parser rules and lexer rules in ANTLR?

Tags:

I understand the theory behind separating parser rules and lexer rules in theory, but what are the practical differences between these two statements in ANTLR:

my_rule: ... ;  MY_RULE: ... ;

Do they result in different AST trees? Different performance? Potential ambiguities?

852

asked Nov 28 '10 16:11

Tony the Pony

2 Answers

... what are the practical differences between these two statements in ANTLR ...

MY_RULE will be used to tokenize your input source. It represents a fundamental building block of your language.

my_rule is called from the parser, it consists of zero or more other parser rules or tokens produced by the lexer.

That's the difference.

Do they result in different AST trees? Different performance? ...

The parser builds the AST using tokens produced by the lexer, so the questions make no sense (to me). A lexer merely "feeds" the parser a 1 dimensional stream of tokens.

130

answered Oct 16 '22 14:10

Bart Kiers

This post may be helpful:

The lexer is responsible for the first step, and it's only job is to create a "token stream" from text. It is not responsible for understanding the semantics of your language, it is only interested in understanding the syntax of your language.

For example, syntax is the rule that an identifier must only use characters, numbers and underscores - as long as it doesn't start with a number. The responsibility of the lexer is to understand this rule. In this case, the lexer would accept the sequence of characters "asd_123" but reject the characters "12dsadsa" (assuming that there isn't another rule in which this text is valid). When seeing the valid text example, it may emit a token into the token stream such as IDENTIFIER(asd_123).

Note that I said "identifier" which is the general term for things like variable names, function names, namespace names, etc. The parser would be the thing that would understand the context in which that identifier appears, so that it would then further specify that token as being a certain thing's name.

(sidenote: the token is just a unique name given to an element of the token stream. The lexeme is the text that the token was matched from. I write the lexeme in parentheses next to the token. For example, NUMBER(123). In this case, this is a NUMBER token with a lexeme of '123'. However, with some tokens, such as operators, I omit the lexeme since it's redundant. For example, I would write SEMICOLON for the semicolon token, not SEMICOLON( ; )).

From ANTLR - When to use Parser Rules vs Lexer Rules?

answered Oct 16 '22 16:10

uestczhangchao

Related questions
                            
                                "Gamification" gem? [closed]
                            
                                What is a table prefix?
                            
                                Configure position of window for cv::imshow
                            
                                How to install Firebug lite IE8?
                            
                                Using SSL sockets and non-SSL sockets simultaneously in Boost.Asio?
                            
                                Python-Scapy or the like-How can I create an HTTP GET request at the packet level
                            
                                LINQPad 4 doesn't know about HttpUtililty - how to resolve?
                            
                                LISTEN/NOTIFY using pg_notify(text, text) in PostgreSQL
                            
                                PLSQL Insert into with subquery and returning clause
                            
                                How to make Chrome remember password for an AJAX form?
                            
                                Why shouldn't you commit on a tag
                            
                                Determine solution configuration (debug/release) when running a T4 template

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With