Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transform parse tree into XML

I have a compiled grammar and I want to use it to transform an input sequence into an XML. Please note that in my case I have a very large grammar with many rules and I would like to avoid overriding each grammar rule in my code.

I will use an example to avoid confusion. Let us have a following grammar

grammar expr;

prog: stat+ ;

stat: expr NEWLINE
 | ID '=' expr NEWLINE
 | NEWLINE
;

expr:  expr ('*'|'/') expr
 | INT
 | ID
 | '(' expr ')'
;

ID : [a-zA-Z]+ ; // match identifiers
INT : [0-9]+ ; // match integers
NEWLINE:'\r'? '\n' ; // return newlines to parser (is end-statement signal)
WS : [ \t]+ -> skip ; // toss out whitespace

Input sequence

A = 10
B = A * A

Expected output

<prog> 
    <stat> 
        A = 
        <expr> 10
        </expr> 
        \r\n
    </stat>  
    <stat> 
        B = 
        <expr>
            <expr>A</expr> 
            * 
            <expr> A</expr>
        </expr> 
        \r\n
    </stat>
</prog>

which corresponds to a parse tree

enter image description here

Currently I use an approach where I create a ParseTree and using the toStringTree method I generate the following string

(prog (stat A = (expr 10) \r\n) (stat B = (expr (expr A) * (expr A)) \r\n))

which I subsequently transform into the XML shown above (I use simple generic code working for any grammar). I find this approach dummy. Is it possible to solve it without toStringTree? I would like to avoid the need to override each grammar rule in my Visitor. (I have hundreds of them).

EDIT

I basically need some kind of generic ParseTree serialization into the XML format. The major goal is that I will not have to write special serialization method in Java for each rule.

like image 552
Radim Bača Avatar asked Mar 08 '23 20:03

Radim Bača


1 Answers

Probably this approach might suit your needs. I wrapped terminal symbols with extra tag t for readability, also skipping those with white-space. Yet it should not be a big problem to adjust the output if required.

final exprLexer lexer = new exprLexer(CharStreams.fromString("A=10\nB = A * A\n"));
final CommonTokenStream tokens = new CommonTokenStream(lexer);
final exprParser parser = new exprParser(tokens);
final ParseTree tree = parser.prog();
ParseTreeWalker.DEFAULT.walk(new exprBaseListener()
{
    final String INDENT = "    ";
    int level = 0;
    @Override
    public void enterEveryRule(final ParserRuleContext ctx)
    {
        System.out.printf("%s<%s>%n", indent(), parser.getRuleNames()[ctx.getRuleIndex()]);
        ++level;
        super.enterEveryRule(ctx);
    }

    @Override
    public void exitEveryRule(final ParserRuleContext ctx)
    {
        --level;
        System.out.printf("%s</%s>%n", indent(), parser.getRuleNames()[ctx.getRuleIndex()]);
        super.exitEveryRule(ctx);
    }

    @Override
    public void visitTerminal(final TerminalNode node)
    {
        final String value = node.getText();
        if (!value.matches("\\s+"))
        {
            System.out.printf("%s<t>%s</t>%n", indent(), node.getText());
        }
        super.visitTerminal(node);
    }

    private String indent()
    {
        return String.join("", Collections.nCopies(level, INDENT));
    }
}, tree);
like image 127
yegodm Avatar answered Mar 10 '23 10:03

yegodm