I'm trying to learn ANTLR and at the same time use it for a current project. I've gotten to the point where I can run the lexer on a chunk of code and output it to a CommonTokenStream. This is working fine, and I've verified that the source text is being broken up into the appropriate tokens. Now, I would like to be able to modify the text of certain tokens in this stream, and display the now modified source code. For example I've tried: <pre class="prettyprint"><code>import org.antlr.runtime.*; import java.util.*; public class LexerTest { public static final int IDENTIFIER_TYPE = 4; public static void main(String[] args) { String input = "public static void main(String[] args) { int myVar = 0; }"; CharStream cs = new ANTLRStringStream(input); JavaLexer lexer = new JavaLexer(cs); CommonTokenStream tokens = new CommonTokenStream(); tokens.setTokenSource(lexer); int size = tokens.size(); for(int i = 0; i < size; i++) { Token token = (Token) tokens.get(i); if(token.getType() == IDENTIFIER_TYPE) { token.setText("V"); } } System.out.println(tokens.toString()); } } </code></pre> I'm trying to set all Identifier token's text to the string literal "V". <ol> <li>Why are my changes to the token's text not reflected when I call tokens.toString()? </li> <li>How am I suppose to know the various Token Type IDs? I walked through with my debugger and saw that the ID for the IDENTIFIER tokens was "4" (hence my constant at the top). But how would I have known that otherwise? Is there some other way of mapping token type ids to the token name?</li> </ol> <hr> EDIT: One thing that is important to me is I wish for the tokens to have their original start and end character positions. That is, I don't want them to reflect their new positions with the variable names changed to "V". This is so I know where the tokens were in the original source text.

ANTLR has a way to do this in it's grammar file. Let's say you're parsing a string consisting of numbers and strings delimited by comma's. A grammar would look like this: <pre class="prettyprint"><code>grammar Foo; parse : value ( ',' value )* EOF ; value : Number | String ; String : '"' ( ~( '"' | '\\' ) | '\\\\' | '\\"' )* '"' ; Number : '0'..'9'+ ; Space : ( ' ' | '\t' ) {skip();} ; </code></pre> This should all look familiar to you. Let's say you want to wrap square brackets around all integer values. Here's how to do that: <pre class="prettyprint"><code>grammar Foo; options {output=template; rewrite=true;} parse : value ( ',' value )* EOF ; value : n=Number -> template(num={$n.text}) "[<num>]" | String ; String : '"' ( ~( '"' | '\\' ) | '\\\\' | '\\"' )* '"' ; Number : '0'..'9'+ ; Space : ( ' ' | '\t' ) {skip();} ; </code></pre> As you see, I've added some <code>options</code> at the top, and added a rewrite rule (everything after the <code>-></code>) after the <code>Number</code> in the <code>value</code> parser rule. Now to test it all, compile and run this class: <pre class="prettyprint"><code>import org.antlr.runtime.*; public class FooTest { public static void main(String[] args) throws Exception { String text = "12, \"34\", 56, \"a\\\"b\", 78"; System.out.println("parsing: "+text); ANTLRStringStream in = new ANTLRStringStream(text); FooLexer lexer = new FooLexer(in); CommonTokenStream tokens = new TokenRewriteStream(lexer); // Note: a TokenRewriteStream! FooParser parser = new FooParser(tokens); parser.parse(); System.out.println("tokens: "+tokens.toString()); } } </code></pre> which produces: <pre class="prettyprint"><code>parsing: 12, "34", 56, "a\"b", 78 tokens: [12],"34",[56],"a\"b",[78] </code></pre>

How can I modify the text of tokens in a CommonTokenStream with ANTLR?

Tags:

compiler-construction

antlr

antlr3

lexical-analysis

I'm trying to learn ANTLR and at the same time use it for a current project.

I've gotten to the point where I can run the lexer on a chunk of code and output it to a CommonTokenStream. This is working fine, and I've verified that the source text is being broken up into the appropriate tokens.

Now, I would like to be able to modify the text of certain tokens in this stream, and display the now modified source code.

For example I've tried:

import org.antlr.runtime.*;
import java.util.*;

public class LexerTest
{
    public static final int IDENTIFIER_TYPE = 4;

    public static void main(String[] args)
    {
    String input = "public static void main(String[] args) { int myVar = 0; }";
    CharStream cs = new ANTLRStringStream(input);


        JavaLexer lexer = new JavaLexer(cs);
        CommonTokenStream tokens = new CommonTokenStream();
        tokens.setTokenSource(lexer);

        int size = tokens.size();
        for(int i = 0; i < size; i++)
        {
            Token token = (Token) tokens.get(i);
            if(token.getType() == IDENTIFIER_TYPE)
            {
                token.setText("V");
            }
        }
        System.out.println(tokens.toString());
    }  
}

I'm trying to set all Identifier token's text to the string literal "V".

Why are my changes to the token's text not reflected when I call tokens.toString()?
How am I suppose to know the various Token Type IDs? I walked through with my debugger and saw that the ID for the IDENTIFIER tokens was "4" (hence my constant at the top). But how would I have known that otherwise? Is there some other way of mapping token type ids to the token name?

EDIT:

One thing that is important to me is I wish for the tokens to have their original start and end character positions. That is, I don't want them to reflect their new positions with the variable names changed to "V". This is so I know where the tokens were in the original source text.

719

asked Feb 09 '10 11:02

mmcdole

1 Answers

ANTLR has a way to do this in it's grammar file.

Let's say you're parsing a string consisting of numbers and strings delimited by comma's. A grammar would look like this:

grammar Foo;

parse
  :  value ( ',' value )* EOF
  ;

value
  :  Number
  |  String
  ;

String
  :  '"' ( ~( '"' | '\\' ) | '\\\\' | '\\"' )* '"'
  ;

Number
  :  '0'..'9'+
  ;

Space
  :  ( ' ' | '\t' ) {skip();}
  ;

This should all look familiar to you. Let's say you want to wrap square brackets around all integer values. Here's how to do that:

grammar Foo;

options {output=template; rewrite=true;} 

parse
  :  value ( ',' value )* EOF
  ;

value
  :  n=Number -> template(num={$n.text}) "[<num>]" 
  |  String
  ;

String
  :  '"' ( ~( '"' | '\\' ) | '\\\\' | '\\"' )* '"'
  ;

Number
  :  '0'..'9'+
  ;

Space
  :  ( ' ' | '\t' ) {skip();}
  ;

As you see, I've added some options at the top, and added a rewrite rule (everything after the ->) after the Number in the value parser rule.

Now to test it all, compile and run this class:

import org.antlr.runtime.*;

public class FooTest {
  public static void main(String[] args) throws Exception {
    String text = "12, \"34\", 56, \"a\\\"b\", 78";
    System.out.println("parsing: "+text);
    ANTLRStringStream in = new ANTLRStringStream(text);
    FooLexer lexer = new FooLexer(in);
    CommonTokenStream tokens = new TokenRewriteStream(lexer); // Note: a TokenRewriteStream!
    FooParser parser = new FooParser(tokens);
    parser.parse();
    System.out.println("tokens: "+tokens.toString());
  }
}

which produces:

parsing: 12, "34", 56, "a\"b", 78
tokens: [12],"34",[56],"a\"b",[78]

100

answered Sep 28 '22 09:09

Bart Kiers

Related questions
                            
                                Why is this code being generated by avr-gcc and how does it work?
                            
                                Objective-C property assignment returns the assigned value?
                            
                                Java: specific enums and generic Enum<?> parameters
                            
                                Compiling the compiler - how many times?
                            
                                Compiler design and construction class [closed]
                            
                                How do languages/runtimes based on JVM generate Java bytecode?
                            
                                How to compile and run a C/C++ program on the Android system (like MinGW on the Windows)?
                            
                                ConditionalAttribute and other special classes
                            
                                What would be involved in getting Free Pascal to compile into AVR, ATMega and Arduino?
                            
                                Is my MIPS compiler crazy, or am I crazy for choosing MIPS?
                            
                                why is this code not compiling with javac but has no errors in eclipse?
                            
                                Compiler Optimization, Thread Safe?
                            
                                C++ implemented in plain C [duplicate]
                            
                                Does ASC 2.0 recognize [Frame] metadata tags (ex: for Preloader factoryClass)?
                            
                                Why Are Vtables Not Being Implemented Correctly On Embedded Platform?
                            
                                Phases SBCL compiler
                            
                                How to Run and Compile .c on Sublime Text 2 [MAC OS X]
                            
                                Does Symbol table for C++ code contain function names along with class names?
                            
                                What is difference between Parse Tree, Annotated Parse Tree and Activation Tree ?(compiler)
                            
                                MS Visual Studio Project header files

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With