I'm trying to learn ANTLR and at the same time use it for a current project.
I've gotten to the point where I can run the lexer on a chunk of code and output it to a CommonTokenStream. This is working fine, and I've verified that the source text is being broken up into the appropriate tokens.
Now, I would like to be able to modify the text of certain tokens in this stream, and display the now modified source code.
For example I've tried:
import org.antlr.runtime.*;
import java.util.*;
public class LexerTest
{
public static final int IDENTIFIER_TYPE = 4;
public static void main(String[] args)
{
String input = "public static void main(String[] args) { int myVar = 0; }";
CharStream cs = new ANTLRStringStream(input);
JavaLexer lexer = new JavaLexer(cs);
CommonTokenStream tokens = new CommonTokenStream();
tokens.setTokenSource(lexer);
int size = tokens.size();
for(int i = 0; i < size; i++)
{
Token token = (Token) tokens.get(i);
if(token.getType() == IDENTIFIER_TYPE)
{
token.setText("V");
}
}
System.out.println(tokens.toString());
}
}
I'm trying to set all Identifier token's text to the string literal "V".
Why are my changes to the token's text not reflected when I call tokens.toString()?
How am I suppose to know the various Token Type IDs? I walked through with my debugger and saw that the ID for the IDENTIFIER tokens was "4" (hence my constant at the top). But how would I have known that otherwise? Is there some other way of mapping token type ids to the token name?
EDIT:
One thing that is important to me is I wish for the tokens to have their original start and end character positions. That is, I don't want them to reflect their new positions with the variable names changed to "V". This is so I know where the tokens were in the original source text.
A language is specified using a context-free grammar expressed using Extended Backus–Naur Form (EBNF). ANTLR can generate lexers, parsers, tree parsers, and combined lexer-parsers.
ANTLR is a powerful parser generator that you can use to read, process, execute, or translate structured text or binary files. It's widely used in academia and industry to build all sorts of languages, tools, and frameworks. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day.
ANTLR (ANother Tool for Language Recognition) is a tool for processing structured text. It does this by giving us access to language processing primitives like lexers, grammars, and parsers as well as the runtime to process text against them. It's often used to build tools and frameworks.
ParseTree tree = parser. select_stmt(); ParseTreeWalker walker = new ParseTreeWalker(); SQLiteListener listener = new SQLiteBaseListener(); ParseTreeWalker.
ANTLR has a way to do this in it's grammar file.
Let's say you're parsing a string consisting of numbers and strings delimited by comma's. A grammar would look like this:
grammar Foo;
parse
: value ( ',' value )* EOF
;
value
: Number
| String
;
String
: '"' ( ~( '"' | '\\' ) | '\\\\' | '\\"' )* '"'
;
Number
: '0'..'9'+
;
Space
: ( ' ' | '\t' ) {skip();}
;
This should all look familiar to you. Let's say you want to wrap square brackets around all integer values. Here's how to do that:
grammar Foo;
options {output=template; rewrite=true;}
parse
: value ( ',' value )* EOF
;
value
: n=Number -> template(num={$n.text}) "[<num>]"
| String
;
String
: '"' ( ~( '"' | '\\' ) | '\\\\' | '\\"' )* '"'
;
Number
: '0'..'9'+
;
Space
: ( ' ' | '\t' ) {skip();}
;
As you see, I've added some options
at the top, and added a rewrite rule (everything after the ->
) after the Number
in the value
parser rule.
Now to test it all, compile and run this class:
import org.antlr.runtime.*;
public class FooTest {
public static void main(String[] args) throws Exception {
String text = "12, \"34\", 56, \"a\\\"b\", 78";
System.out.println("parsing: "+text);
ANTLRStringStream in = new ANTLRStringStream(text);
FooLexer lexer = new FooLexer(in);
CommonTokenStream tokens = new TokenRewriteStream(lexer); // Note: a TokenRewriteStream!
FooParser parser = new FooParser(tokens);
parser.parse();
System.out.println("tokens: "+tokens.toString());
}
}
which produces:
parsing: 12, "34", 56, "a\"b", 78
tokens: [12],"34",[56],"a\"b",[78]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With