Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to output CDATA using ElementTree

Tags:

python

xml

I've discovered that cElementTree is about 30 times faster than xml.dom.minidom and I'm rewriting my XML encoding/decoding code. However, I need to output XML that contains CDATA sections and there doesn't seem to be a way to do that with ElementTree.

Can it be done?

like image 280
elifiner Avatar asked Oct 06 '08 15:10

elifiner


People also ask

How do I use CDATA in XML?

A CDATA section begins with the character sequence <! [CDATA[ and ends with the character sequence ]]>. Between the two character sequences, an XML processor ignores all markup characters such as <, >, and &. The only markup an XML pro-cessor recognizes inside a CDATA section is the closing character sequence ]>.

What is CDATA in Python?

CData Sync Replicate any data source to any database or warehouse. CData Arc Comprehensive no-code B2B integration in the cloud or on-premises.

Can we use CDATA in XML attribute?

No, The markup denoting a CDATA Section is not permitted as the value of an attribute.


2 Answers

After a bit of work, I found the answer myself. Looking at the ElementTree.py source code, I found there was special handling of XML comments and preprocessing instructions. What they do is create a factory function for the special element type that uses a special (non-string) tag value to differentiate it from regular elements.

def Comment(text=None):     element = Element(Comment)     element.text = text     return element 

Then in the _write function of ElementTree that actually outputs the XML, there's a special case handling for comments:

if tag is Comment:     file.write("<!-- %s -->" % _escape_cdata(node.text, encoding)) 

In order to support CDATA sections, I create a factory function called CDATA, extended the ElementTree class and changed the _write function to handle the CDATA elements.

This still doesn't help if you want to parse an XML with CDATA sections and then output it again with the CDATA sections, but it at least allows you to create XMLs with CDATA sections programmatically, which is what I needed to do.

The implementation seems to work with both ElementTree and cElementTree.

import elementtree.ElementTree as etree #~ import cElementTree as etree  def CDATA(text=None):     element = etree.Element(CDATA)     element.text = text     return element  class ElementTreeCDATA(etree.ElementTree):     def _write(self, file, node, encoding, namespaces):         if node.tag is CDATA:             text = node.text.encode(encoding)             file.write("\n<![CDATA[%s]]>\n" % text)         else:             etree.ElementTree._write(self, file, node, encoding, namespaces)  if __name__ == "__main__":     import sys      text = """     <?xml version='1.0' encoding='utf-8'?>     <text>     This is just some sample text.     </text>     """      e = etree.Element("data")     cdata = CDATA(text)     e.append(cdata)     et = ElementTreeCDATA(e)     et.write(sys.stdout, "utf-8") 
like image 61
elifiner Avatar answered Sep 17 '22 15:09

elifiner


lxml has support for CDATA and API like ElementTree.

like image 32
iny Avatar answered Sep 17 '22 15:09

iny