is it possible to invoke a rule from a different grammar?
the purpose is to have two languages in the same file, the second language starting by an (begin ...) where ... is in the second language. the grammar should invoke another grammar to parse that second language.
for example:
grammar A;
start_rule
: '(' 'begin' B.program ')' //or something like that
;
grammar B;
program
: something* EOF
;
something
: ...
;
ANTLR (ANother Tool for Language Recognition) is a tool for processing structured text. It does this by giving us access to language processing primitives like lexers, grammars, and parsers as well as the runtime to process text against them. It's often used to build tools and frameworks.
Add the package name that you want to see in the Java file in which the lexer and parser files will be created. Add the Language in which you want the output like Java , Python etc. Tick the generate parser tree listener and generate tree visitor if you want to modify the visitor. Now the configuration is done.
You should include an explicit EOF at the end of your entry rule any time you are trying to parse an entire input file. If you do not include the EOF , it means you are not trying to parse the entire input, and it's acceptable to parse only a portion of the input if it means avoiding a syntax error.
In computer-based language recognition, ANTLR (pronounced antler), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development.
Your question could be interpreted in (at least) two ways:
I assume it's the first, in which case you can import grammars.
lexer grammar L;
Digit
: '0'..'9'
;
parser grammar Sub;
number
: Digit+
;
grammar Root;
import Sub;
parse
: number EOF {System.out.println("Parsed: " + $number.text);}
;
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
L lexer = new L(new ANTLRStringStream("42"));
CommonTokenStream tokens = new CommonTokenStream(lexer);
RootParser parser = new RootParser(tokens);
parser.parse();
}
}
bart@hades:~/Programming/ANTLR/Demos/Composite$ java -cp antlr-3.3.jar org.antlr.Tool L.g
bart@hades:~/Programming/ANTLR/Demos/Composite$ java -cp antlr-3.3.jar org.antlr.Tool Root.g
bart@hades:~/Programming/ANTLR/Demos/Composite$ javac -cp antlr-3.3.jar *.java
bart@hades:~/Programming/ANTLR/Demos/Composite$ java -cp .:antlr-3.3.jar Main
which will print:
Parsed: 42
to the console.
More info, see: http://www.antlr.org/wiki/display/ANTLR3/Composite+Grammars
A nice example of a language inside a language is regex. You have the "normal" regex language with its meta characters, but there's another one in it: the language that describes a character set (or character class).
Instead of accounting for the meta characters of a character set (range -
, negation ^
, etc.) inside your regex-grammar, you could simply consider a character set as a single token consisting of a [
and then everything up to and including ]
(with possibly \]
in it!) inside your regex-grammar. When you then stumble upon a CharSet
token in one of your parser rules, you invoke the CharSet-parser.
grammar Regex;
options {
output=AST;
}
tokens {
REGEX;
ATOM;
CHARSET;
INT;
GROUP;
CONTENTS;
}
@members {
public static CommonTree ast(String source) throws RecognitionException {
RegexLexer lexer = new RegexLexer(new ANTLRStringStream(source));
RegexParser parser = new RegexParser(new CommonTokenStream(lexer));
return (CommonTree)parser.parse().getTree();
}
}
parse
: atom+ EOF -> ^(REGEX atom+)
;
atom
: group quantifier? -> ^(ATOM group quantifier?)
| EscapeSeq quantifier? -> ^(ATOM EscapeSeq quantifier?)
| Other quantifier? -> ^(ATOM Other quantifier?)
| CharSet quantifier? -> ^(CHARSET {CharSetParser.ast($CharSet.text)} quantifier?)
;
group
: '(' atom+ ')' -> ^(GROUP atom+)
;
quantifier
: '+'
| '*'
;
CharSet
: '[' (('\\' .) | ~('\\' | ']'))+ ']'
;
EscapeSeq
: '\\' .
;
Other
: ~('\\' | '(' | ')' | '[' | ']' | '+' | '*')
;
grammar CharSet;
options {
output=AST;
}
tokens {
NORMAL_CHAR_SET;
NEGATED_CHAR_SET;
RANGE;
}
@members {
public static CommonTree ast(String source) throws RecognitionException {
CharSetLexer lexer = new CharSetLexer(new ANTLRStringStream(source));
CharSetParser parser = new CharSetParser(new CommonTokenStream(lexer));
return (CommonTree)parser.parse().getTree();
}
}
parse
: OSqBr ( normal -> ^(NORMAL_CHAR_SET normal)
| negated -> ^(NEGATED_CHAR_SET negated)
)
CSqBr
;
normal
: (EscapeSeq | Hyphen | Other) atom* Hyphen?
;
negated
: Caret normal -> normal
;
atom
: EscapeSeq
| Caret
| Other
| range
;
range
: from=Other Hyphen to=Other -> ^(RANGE $from $to)
;
OSqBr
: '['
;
CSqBr
: ']'
;
EscapeSeq
: '\\' .
;
Caret
: '^'
;
Hyphen
: '-'
;
Other
: ~('-' | '\\' | '[' | ']')
;
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
CommonTree tree = RegexParser.ast("((xyz)*[^\\da-f])foo");
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
And if you run the main class, you will see the DOT output for the regex ((xyz)*[^\\da-f])foo
which is the following tree:
The magic is inside the Regex.g
grammar in the atom
rule where I inserted a tree node in a rewrite rule by invoking the static ast
method from the CharSetParser
class:
CharSet ... -> ^(... {CharSetParser.ast($CharSet.text)} ...)
Note that inside such rewrite rules, there must not be a semi colon! So, this would be wrong: {CharSetParser.ast($CharSet.text);}
.
And here's how to create tree walkers for both grammars:
tree grammar RegexWalker;
options {
tokenVocab=Regex;
ASTLabelType=CommonTree;
}
walk
: ^(REGEX atom+) {System.out.println("REGEX: " + $start.toStringTree());}
;
atom
: ^(ATOM group quantifier?)
| ^(ATOM EscapeSeq quantifier?)
| ^(ATOM Other quantifier?)
| ^(CHARSET t=. quantifier?) {CharSetWalker.walk($t);}
;
group
: ^(GROUP atom+)
;
quantifier
: '+'
| '*'
;
tree grammar CharSetWalker;
options {
tokenVocab=CharSet;
ASTLabelType=CommonTree;
}
@members {
public static void walk(CommonTree tree) {
try {
CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
CharSetWalker walker = new CharSetWalker(nodes);
walker.walk();
} catch(Exception e) {
e.printStackTrace();
}
}
}
walk
: ^(NORMAL_CHAR_SET normal) {System.out.println("NORMAL_CHAR_SET: " + $start.toStringTree());}
| ^(NEGATED_CHAR_SET normal) {System.out.println("NEGATED_CHAR_SET: " + $start.toStringTree());}
;
normal
: (EscapeSeq | Hyphen | Other) atom* Hyphen?
;
atom
: EscapeSeq
| Caret
| Other
| range
;
range
: ^(RANGE Other Other)
;
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
CommonTree tree = RegexParser.ast("((xyz)*[^\\da-f])foo");
CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
RegexWalker walker = new RegexWalker(nodes);
walker.walk();
}
}
To run the demo, do:
java -cp antlr-3.3.jar org.antlr.Tool CharSet.g
java -cp antlr-3.3.jar org.antlr.Tool Regex.g
java -cp antlr-3.3.jar org.antlr.Tool CharSetWalker.g
java -cp antlr-3.3.jar org.antlr.Tool RegexWalker.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
which will print:
NEGATED_CHAR_SET: (NEGATED_CHAR_SET \d (RANGE a f))
REGEX: (REGEX (ATOM (GROUP (ATOM (GROUP (ATOM x) (ATOM y) (ATOM z)) *) (CHARSET (NEGATED_CHAR_SET \d (RANGE a f))))) (ATOM f) (ATOM o) (ATOM o))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With