Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Practical difference between parser rules and lexer rules in ANTLR?

Tags:

I understand the theory behind separating parser rules and lexer rules in theory, but what are the practical differences between these two statements in ANTLR:

my_rule: ... ;  MY_RULE: ... ; 

Do they result in different AST trees? Different performance? Potential ambiguities?

like image 852
Tony the Pony Avatar asked Nov 28 '10 16:11

Tony the Pony


People also ask

What is the difference between lexer and parser?

A lexer is a software program that performs lexical analysis. ... A parser goes one level further than thelexer and takes the tokens produced by the lexer and tries to determine if proper sentences have been formed. Parsers work at the grammatical level, lexerswork at the word level.

What is lexer and parser Antlr?

ANTLR or ANother Tool for Language Recognition is a lexer and parser generator aimed at building and walking parse trees. It makes it effortless to parse nontrivial text inputs such as a programming language syntax.

What are lexer rules?

Lexer rules allow your parser to match context-free structures on the input character stream as opposed to the much weaker regular structures (using a DFA--deterministic finite automaton).

What is a lexer?

A lexer will take an input character stream and convert it into tokens. This can be used for a variety of purposes. You could apply transformations to the lexemes for simple text processing and manipulation. Or the stream of lexemes can be fed to a parser which will convert it into a parser tree.


2 Answers

... what are the practical differences between these two statements in ANTLR ...

MY_RULE will be used to tokenize your input source. It represents a fundamental building block of your language.

my_rule is called from the parser, it consists of zero or more other parser rules or tokens produced by the lexer.

That's the difference.

Do they result in different AST trees? Different performance? ...

The parser builds the AST using tokens produced by the lexer, so the questions make no sense (to me). A lexer merely "feeds" the parser a 1 dimensional stream of tokens.

like image 130
Bart Kiers Avatar answered Oct 16 '22 14:10

Bart Kiers


This post may be helpful:

The lexer is responsible for the first step, and it's only job is to create a "token stream" from text. It is not responsible for understanding the semantics of your language, it is only interested in understanding the syntax of your language.

For example, syntax is the rule that an identifier must only use characters, numbers and underscores - as long as it doesn't start with a number. The responsibility of the lexer is to understand this rule. In this case, the lexer would accept the sequence of characters "asd_123" but reject the characters "12dsadsa" (assuming that there isn't another rule in which this text is valid). When seeing the valid text example, it may emit a token into the token stream such as IDENTIFIER(asd_123).

Note that I said "identifier" which is the general term for things like variable names, function names, namespace names, etc. The parser would be the thing that would understand the context in which that identifier appears, so that it would then further specify that token as being a certain thing's name.

(sidenote: the token is just a unique name given to an element of the token stream. The lexeme is the text that the token was matched from. I write the lexeme in parentheses next to the token. For example, NUMBER(123). In this case, this is a NUMBER token with a lexeme of '123'. However, with some tokens, such as operators, I omit the lexeme since it's redundant. For example, I would write SEMICOLON for the semicolon token, not SEMICOLON( ; )).

From ANTLR - When to use Parser Rules vs Lexer Rules?

like image 26
uestczhangchao Avatar answered Oct 16 '22 16:10

uestczhangchao