Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Close all opened xml tags

Tags:

python

xml

I have a file, which change it content in a short time. But I'd like to read it before it is ready. The problem is, that it is an xml-file (log). So when you read it, it could be, that not all tags are closed.

I would like to know if there is a possibility to close all opened tags correctly, that there are no problems to show it in the browser (with xslt stylsheet). This should be made by using included features of python.

like image 561
pyle Avatar asked Oct 29 '09 16:10

pyle


2 Answers

Some XML parsers allow incremental parsing of XML documents that is the parser can start working on the document without needing it to be fully loaded. The XMLTreeBuilder from the xml.etree.ElementTree module in the Python standard library is one such parser: Element Tree

As you can see in the example below you can feed data to the parser bit by bit as you read it from your input source. The appropriate hook methods in your handler class will get called when various XML "events" happen (tag started, tag data read, tag ended) allowing you to process the data as the XML document is loaded:

from xml.etree.ElementTree import XMLTreeBuilder
class MyHandler(object):
    def start(self, tag, attrib):
        # Called for each opening tag.
        print tag + " started"
    def end(self, tag):
        # Called for each closing tag.
        print tag  + " ended"
    def data(self, data):
        # Called when data is read from a tag
        print data  + " data read"
    def close(self):    
        # Called when all data has been parsed.
        print "All data read"

handler = MyHandler()

parser = XMLTreeBuilder(target=handler)

parser.feed(<sometag>)
parser.feed(<sometag-child-tag>text)
parser.feed(</sometag-child-tag>)
parser.feed(</sometag>)
parser.close()

In this example the handler would receive five events and print:

sometag started

sometag-child started

"text" data read

sometag-child ended

sometag ended

All data read

like image 87
Tendayi Mawushe Avatar answered Oct 20 '22 05:10

Tendayi Mawushe


If I am understanding your question correctly, you have a log file that is always being appended to so you get something like:

<root>
<entry> ... </entry>
<entry> ... </entry>
...
<entry> ... </entry
<!-- no closing root -->

In this case you DON'T want to use a DOM parser because it tries to read a complete document and would choke on the missing tag. Instead, a SAX or Pull parser would work because it reads the document like a stream of data rather than a complete tree. As Denis replied above, you could either close the missing tag at the end or ignore any incomplete tags before writing it out.

XML parsing on Wikipedia

like image 35
Anton Avatar answered Oct 20 '22 05:10

Anton