Using Python.
So basically I have a XML like tag syntax but the tags don't have attributes. So <a>
but not <a value='t'>
. They close regularly with </a>
.
Here is my question. I have something that looks like this:
<al>
1. test
2. test2
test with new line
3. test3
<al>
1. test 4
<al>
2. test 5
3. test 6
4. test 7
</al>
</al>
4. test 8
</al>
And I want to transform it into:
<al>
<li>test</li>
<li> test2</li>
<li> test with new line</li>
<li> test3
<al>
<li> test 4 </li>
<al>
<li> test 5</li>
<li> test 6</li>
<li> test 7</li>
</al>
</li>
</al>
</li>
<li> test 8</li>
</al>
I'm not really looking for a completed solution but rather a push into the right direction. I am just wondering how the folks here would approach the problem. Solely REGEX? write a full custom parser for the attribute-less tag syntax? Hacking up existing XML parsers? etc.
Thanks in advance
I'd recommend start with the following:
from xml.dom.minidom import parse, parseString
xml = parse(...)
l = xml.getElementsByTagName('al')
then traverse all elements in l
, examining their text subnodes (as well as <al>
nodes recursively).
You may start playing with this right away in the Python console.
It is easy to remove text nodes, then split text chunks with chunk.split('\n')
and add <li>
nodes back, as you need.
After modifying all the <al>
nodes you may just call xml.toxml()
to get the resulting xml as text.
Note that the element objects you get from this are linked back to the original xml
document object, so do not delete the xml
object in the process.
This way I personally consider more straightforward and easy to debug than mangling with multiline regexps.
The way you've described your syntax, it is "XML without attributes". If that's so, it's still XML, so you can use XML tools such as XSLT and XQuery.
If you allow things that aren't allowed in XML, on the other hand, my approach would be to write a parser that handles your non-XML format and delivers XML-compatible SAX events. Then you'll be able to use any XML technology just by plugging in your parser in place of the regular XML parser.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With