I have a file, which change it content in a short time. But I'd like to read it before it is ready. The problem is, that it is an xml-file (log). So when you read it, it could be, that not all tags are closed.
I would like to know if there is a possibility to close all opened tags correctly, that there are no problems to show it in the browser (with xslt stylsheet). This should be made by using included features of python.
Some XML parsers allow incremental parsing of XML documents that is the parser can start working on the document without needing it to be fully loaded. The XMLTreeBuilder from the xml.etree.ElementTree module in the Python standard library is one such parser: Element Tree
As you can see in the example below you can feed data to the parser bit by bit as you read it from your input source. The appropriate hook methods in your handler class will get called when various XML "events" happen (tag started, tag data read, tag ended) allowing you to process the data as the XML document is loaded:
from xml.etree.ElementTree import XMLTreeBuilder
class MyHandler(object):
def start(self, tag, attrib):
# Called for each opening tag.
print tag + " started"
def end(self, tag):
# Called for each closing tag.
print tag + " ended"
def data(self, data):
# Called when data is read from a tag
print data + " data read"
def close(self):
# Called when all data has been parsed.
print "All data read"
handler = MyHandler()
parser = XMLTreeBuilder(target=handler)
parser.feed(<sometag>)
parser.feed(<sometag-child-tag>text)
parser.feed(</sometag-child-tag>)
parser.feed(</sometag>)
parser.close()
In this example the handler would receive five events and print:
sometag started
sometag-child started
"text" data read
sometag-child ended
sometag ended
All data read
If I am understanding your question correctly, you have a log file that is always being appended to so you get something like:
<root>
<entry> ... </entry>
<entry> ... </entry>
...
<entry> ... </entry
<!-- no closing root -->
In this case you DON'T want to use a DOM parser because it tries to read a complete document and would choke on the missing tag. Instead, a SAX or Pull parser would work because it reads the document like a stream of data rather than a complete tree. As Denis replied above, you could either close the missing tag at the end or ignore any incomplete tags before writing it out.
XML parsing on Wikipedia
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With