Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Transform parse tree into XML

I have a compiled grammar and I want to use it to transform an input sequence into an XML. Please note that in my case I have a very large grammar with many rules and I would like to avoid overriding each grammar rule in my code.

I will use an example to avoid confusion. Let us have a following grammar

grammar expr;

prog: stat+ ;

stat: expr NEWLINE
 | ID '=' expr NEWLINE

expr:  expr ('*'|'/') expr
 | INT
 | ID
 | '(' expr ')'

ID : [a-zA-Z]+ ; // match identifiers
INT : [0-9]+ ; // match integers
NEWLINE:'\r'? '\n' ; // return newlines to parser (is end-statement signal)
WS : [ \t]+ -> skip ; // toss out whitespace

Input sequence

A = 10
B = A * A

Expected output

        A = 
        <expr> 10
        B = 
            <expr> A</expr>

which corresponds to a parse tree

enter image description here

Currently I use an approach where I create a ParseTree and using the toStringTree method I generate the following string

(prog (stat A = (expr 10) \r\n) (stat B = (expr (expr A) * (expr A)) \r\n))

which I subsequently transform into the XML shown above (I use simple generic code working for any grammar). I find this approach dummy. Is it possible to solve it without toStringTree? I would like to avoid the need to override each grammar rule in my Visitor. (I have hundreds of them).


I basically need some kind of generic ParseTree serialization into the XML format. The major goal is that I will not have to write special serialization method in Java for each rule.

like image 552
Radim Bača Avatar asked Mar 08 '23 20:03

Radim Bača

1 Answers

Probably this approach might suit your needs. I wrapped terminal symbols with extra tag t for readability, also skipping those with white-space. Yet it should not be a big problem to adjust the output if required.

final exprLexer lexer = new exprLexer(CharStreams.fromString("A=10\nB = A * A\n"));
final CommonTokenStream tokens = new CommonTokenStream(lexer);
final exprParser parser = new exprParser(tokens);
final ParseTree tree = parser.prog();
ParseTreeWalker.DEFAULT.walk(new exprBaseListener()
    final String INDENT = "    ";
    int level = 0;
    public void enterEveryRule(final ParserRuleContext ctx)
        System.out.printf("%s<%s>%n", indent(), parser.getRuleNames()[ctx.getRuleIndex()]);

    public void exitEveryRule(final ParserRuleContext ctx)
        System.out.printf("%s</%s>%n", indent(), parser.getRuleNames()[ctx.getRuleIndex()]);

    public void visitTerminal(final TerminalNode node)
        final String value = node.getText();
        if (!value.matches("\\s+"))
            System.out.printf("%s<t>%s</t>%n", indent(), node.getText());

    private String indent()
        return String.join("", Collections.nCopies(level, INDENT));
}, tree);
like image 127
yegodm Avatar answered Mar 10 '23 10:03
