Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ANTLR4: lexer rule for: Any string as long as it doesn't contain these two side-by-side characters?

Is there any way to express this in ANTLR4:

Any string as long as it doesn't contain the asterisk immediately followed by a forward slash?

This doesn't work: (~'*/')* as ANTRL throws this error: multi-character literals are not allowed in lexer sets: '*/'

This works but isn't correct: (~[*/])* as it prohibits a string containing the individual character * or /.

like image 222
Roger Costello Avatar asked Jan 08 '23 13:01

Roger Costello


2 Answers

I had similar problem, my solution: ( ~'*' | ( '*'+ ~[/*]) )* '*'*.

like image 79
Radosław Kotkiewicz Avatar answered Feb 10 '23 10:02

Radosław Kotkiewicz


The closest I can come is to put the test in the parser instead of the lexer. That's not exactly what you're asking for, but it does work.

The trick is to use a semantic predicate before any string that must be tested for any Evil Characters. The actual testing is done in Java.

grammar myTest;

@header
{
    import java.util.*;
}

@parser::members
{
    boolean hasEvilCharacters(String input)
    {
        if (input.contains("*/"))
        {
            return false;
        }
        else
        {
            return true;
        }
    }
}

// Mimics a very simple sentence, such as: 
//   I am clean.
//   I have evil char*/acters.
myTest
    : { hasEvilCharacters(_input.LT(1).getText()) }? String 
      (Space { hasEvilCharacters(_input.LT(1).getText()) }? String)* 
      Period EOF
    ;

String
    : ('A'..'Z' | 'a'..'z')+      
    ;

Space
    : ' '
    ;

Period
    : '.'
    ;

Tested with ANTLR 4.4 via the TestRig in ANTLRWorks 2 in NetBeans 8.0.1.

like image 35
james.garriss Avatar answered Feb 10 '23 10:02

james.garriss