Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does logical AND and NOT exists in ANTLR?

Is there NOT logic in ANTLR? Im basically trying to negate a rule that i have and was wondering if its possible, also is there AND logic?

like image 385
Victor Avatar asked Apr 03 '11 21:04

Victor


People also ask

What grammar does ANTLR use?

A language is specified using a context-free grammar expressed using Extended Backus–Naur Form (EBNF). ANTLR can generate lexers, parsers, tree parsers, and combined lexer-parsers.

What are fragments in ANTLR?

A fragment is somewhat akin to an inline function: It makes the grammar more readable and easier to maintain. A fragment will never be counted as a token, it only serves to simplify a grammar.

How does ANTLR lexer work?

An ANTLR lexer creates a Token object after matching a lexical rule. Each request for a token starts in Lexer. nextToken , which calls emit once it has identified a token. emit collects information from the current state of the lexer to build the token.

Is ANTLR case sensitive?

By default an ANTLR parser relies on a case sensitive lexer, so an input string like CODE is considered different than code .


1 Answers

@larsmans already supplied the answer, I just like to give an example of the legal negations in ANTLR rules (since it happens quite a lot that mistakes are made with them).

The negation operator in ANTLR is ~ (tilde). Inside lexer rules, the ~ negates a single character:

NOT_A : ~'A';

matches any character except 'A' and:

NOT_LOWER_CASE : ~('a'..'z');

matches any character except a lowercase ASCII letter. The lats example could also be written as:

NOT_LOWER_CASE : ~LOWER_CASE;
LOWER_CASE : 'a'..'z';

As long as you negate just a single character, it's valid to use ~. It is invalid to do something like this:

INVALID : ~('a' | 'aa');

because you can't negate the string 'aa'.

Inside parser rules, negation does not work with characters, but on tokens. So the parse rule:

parse
  :  ~B
  ;

A : 'a';
B : 'b';
C : 'c';

does not match any character other than 'b', but matches any token other than the B token. So it'd match either token A (character 'a') or token C (character 'c').

The same logic applies to the . (DOT) operator:

  • inside lexer rules it matches any character from the set \u0000..\uFFFF;
  • inside parser rules it matches any token (any lexer rule).
like image 198
Bart Kiers Avatar answered Oct 19 '22 14:10

Bart Kiers