Parsing CDATA in xml with python

Q: What does <![ CDATA in XML mean?

The term CDATA means, Character Data. CDATA is defined as blocks of text that are not parsed by the parser, but are otherwise recognized as markup. The predefined entities such as &lt;, &gt;, and &amp; require typing and are generally difficult to read in the markup.

Tags:

python

parsing

xml

lxml

I need to parse an XML file with a number of blocks of CDATA that I need to retain for later plotting:

<process id="process1"> <log name="name1" device="device1"><![CDATA[timestamp value]]]></log> <log name="name2" device="device2"><![CDATA[timestamp value, timestamp value, timestamp]]]></log> </process>

I will need to do this repeatedly and quickly, and I am looking for the best way to do this. I've read that ElementTree is the faster of the methods, but I am open to other suggestions.

460

asked Dec 04 '12 00:12

Jen

1 Answers

Here are two examples of how to do it:

from lxml import etree
import xml.etree.ElementTree as ElementTree

CONTENT = """
<process id="process1">
 <log name="name1" device="device1"><![CDATA[timestamp value]]></log>
 <log name="name2" device="device2"><![CDATA[timestamp value, timestamp value, timestamp]]></log>
</process>
"""

def parse_with_lxml():
    root = etree.fromstring(CONTENT)
    for log in root.xpath("//log"):
        print log.text

def parse_with_stdlib():
    root = ElementTree.fromstring(CONTENT)
    for log in root.iter('log'):
        print log.text

if __name__ == '__main__':
    parse_with_lxml()
    parse_with_stdlib()

Output:

timestamp value
timestamp value, timestamp value, timestamp
timestamp value
timestamp value, timestamp value, timestamp

The text attribute it handles it in both cases.

130

answered Oct 16 '22 10:10

Joe

Related questions
                            
                                problems installing pycrypto on osx
                            
                                Determine if python is being run in Ubuntu Linux
                            
                                'easy_install -U cython' fails complaining about vcvarsall.bat and -mno-cygwin
                            
                                Use Python code in C/C++
                            
                                Brute forcing DES with a weak key
                            
                                Automatically decorating every instance method in a class
                            
                                Training Naive Bayes Classifier on ngrams
                            
                                bottle on cherrypy server + ssl
                            
                                python celery - ImportError: No module named _curses - while attempting to run manage.py celeryev
                            
                                Does pytest support "default" markers?
                            
                                Is concurrency possible in tornado?
                            
                                Generating users accounts inside Google App Engine
                            
                                Flask-Admin + (Flask-Login and/or Flask-Principal)
                            
                                MongoEngine ListField within a EmbeddedDocument throws TypeError on validation
                            
                                Regex to remove repeated character pattern in a string
                            
                                Processing Simultaneous/Asynchronous Requests with Python BaseHTTPServer
                            
                                Error Connecting to Outlook via COM
                            
                                matplotlib bar with asymmetrical error bars
                            
                                where are operators mapped to magic methods in python?
                            
                                What does the object declaration mean in a python class, and should I use it? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With