Best way transform custom XML like syntax

Question

Using Python.

So basically I have a XML like tag syntax but the tags don't have attributes. So <a> but not <a value='t'>. They close regularly with </a>.

Here is my question. I have something that looks like this:

<al>
1. test
2. test2
 test with new line
3.  test3
<al>
    1. test 4
    <al>
        2. test 5
        3. test 6
        4. test 7
    </al>
</al>
4. test 8
</al>

And I want to transform it into:

<al>
<li>test</li>
<li> test2</li>
<li> test with new line</li>
<li>  test3
<al>
    <li> test 4 </li>
    <al>
        <li> test 5</li>
        <li> test 6</li>
        <li> test 7</li>
    </al>
    </li>
</al>
</li>
<li> test 8</li>
</al>

I'm not really looking for a completed solution but rather a push into the right direction. I am just wondering how the folks here would approach the problem. Solely REGEX? write a full custom parser for the attribute-less tag syntax? Hacking up existing XML parsers? etc.

Thanks in advance

spacediver · Accepted Answer

I'd recommend start with the following:

from xml.dom.minidom import parse, parseString

xml = parse(...)
l = xml.getElementsByTagName('al')

then traverse all elements in l, examining their text subnodes (as well as <al> nodes recursively).

You may start playing with this right away in the Python console.

It is easy to remove text nodes, then split text chunks with chunk.split(' ') and add <li> nodes back, as you need.

After modifying all the <al> nodes you may just call xml.toxml() to get the resulting xml as text.

Note that the element objects you get from this are linked back to the original xml document object, so do not delete the xml object in the process.

This way I personally consider more straightforward and easy to debug than mangling with multiline regexps.

Michael Kay · Answer

The way you've described your syntax, it is "XML without attributes". If that's so, it's still XML, so you can use XML tools such as XSLT and XQuery.

If you allow things that aren't allowed in XML, on the other hand, my approach would be to write a parser that handles your non-XML format and delivers XML-compatible SAX events. Then you'll be able to use any XML technology just by plugging in your parser in place of the regular XML parser.

Best way transform custom XML like syntax

Tags:

python

parsing

xml

tags

Peach Passion

2 Answers

spacediver

Michael Kay

Recent Activity

Donate For Us

Best way transform custom XML like syntax

Tags:

python

parsing

xml

tags

Peach Passion

2 Answers

spacediver

Michael Kay

Related questions

Recent Activity

Donate For Us