Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to use keywords as identifiers in ANTLR4; not working

Tags:

antlr4

I'm trying to get some sql keywords to be accepted as identifiers, when used as identifiers. The Antlr book p210 suggests this trick:

id : 'if' | 'call' | 'then' | ID ;

I've got something similar but it's not working and I assume it's a misunderstanding on my part. regular_ident is the parse rule for an identifier thus:

regular_ident :  // (1)
        KEYWORD_AS_IDENT
        |
        REGULAR_IDENT
    ;

REGULAR_IDENT is the main lex rule for idents. It's roughly this (simplified here), and it works:

REGULAR_IDENT :
        [a-zA-Z]  ( [a-zA-Z0-9] * )
    ;

KEYWORD_AS_IDENT is the list of special words, here's an extract:

KEYWORD_AS_IDENT :  // (2)
[...snip...]
  | FILESTREAM
  | SPARSE
  | NO
  | ACTION
  | PERSISTED
  | FILETABLE_DIRECTORY
  | FILETABLE_COLLATE_FILENAME
  | FILETABLE_PRIMARY_KEY_CONSTRAINT_NAME
  | FILETABLE_STREAMID_UNIQUE_CONSTRAINT_NAME
  | FILETABLE_FULLPATH_UNIQUE_CONSTRAINT_NAME
  | COLUMN_SET
  | ALL_SPARSE_COLUMNS
 ;

where components are defined elsewhere:

SPARSE : 'sparse' ;
NO     : 'no'
(etc)

If I give it fetch aaa as input ('aaa' is not a keyword), it parses:

successfully parsing a normal identifier

but if I give it fetch sparse it fails - 'sparse' is a keyword:

failing to parse with a keyword

perhaps I'm being dumb but I can't see why, as SPARSE is a member of KEYWORD_AS_IDENT. If I cut & paste some of (2) into (1) to get this:

regular_ident :
    FILESTREAM
  | SPARSE
  | NO
  | ACTION
  | PERSISTED
  | FILETABLE_DIRECTORY
        |
    REGULAR_IDENT
    ;

it suddenly is ok with fetch sparse as it now treats 'sparse' as an regular_ident:

enter image description here

but why does (1) not work? I can fix it trivially by inlining all of KEYWORD_AS_IDENT but I need to know what I'm missing.

All suggestions appreciated.

like image 718
user3779002 Avatar asked Jan 29 '26 10:01

user3779002


1 Answers

Reply from Eric Vergnaud from google group antlr-discussion:

LAST is declared before KEYWORD_AS_IDENT so when the lexer encounters 'last', it generates a LAST token, not a KEYWORD_AS_IDENT. Your start rule does not accept LAST token as a valid input, hence the shouting. Your grammar will actually NEVER produce a KEYWORD_AS_IDENT token, because another valid token will match before. It seems you are trying to get the lexer do the job of the parser i.e. handle multiple semantic alternatives, but at the time the token reaches the parser it's too late... Have you tried making KEYWORD_AS_IDENT a parser rule (lowercase) rather than a lexer rule?

So my understanding of the lexer was faulty, and he's correct that I was trying to get it to do the parser's job.

like image 138
user3779002 Avatar answered Feb 03 '26 08:02

user3779002



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!