libxml2 has a number of advantages: <ol> <li>Compliance to the spec </li> <li>Active development and a community participation </li> <li>Speed. This is really a python wrapper around a C implementation. </li> <li>Ubiquity. The libxml2 library is pervasive and thus well tested.</li> </ol> Downsides include: <ol> <li>Compliance to the spec. It's strict. Things like default namespace handling are easier in other libraries.</li> <li>Use of native code. This can be a pain depending on your how your application is distributed / deployed. RPMs are available that ease some of this pain.</li> <li>Manual resource handling. Note in the sample below the calls to freeDoc() and xpathFreeContext(). This is not very Pythonic.</li> </ol> If you are doing simple path selection, stick with ElementTree ( which is included in Python 2.5 ). If you need full spec compliance or raw speed and can cope with the distribution of native code, go with libxml2. Sample of libxml2 XPath Use <hr> <pre class="prettyprint"><code>import libxml2 doc = libxml2.parseFile("tst.xml") ctxt = doc.xpathNewContext() res = ctxt.xpathEval("//*") if len(res) != 2: print "xpath query: wrong node set size" sys.exit(1) if res[0].name != "doc" or res[1].name != "foo": print "xpath query: wrong node set value" sys.exit(1) doc.freeDoc() ctxt.xpathFreeContext() </code></pre> Sample of ElementTree XPath Use <hr> <pre class="prettyprint"><code>from elementtree.ElementTree import ElementTree mydoc = ElementTree(file='tst.xml') for e in mydoc.findall('/foo/bar'): print e.get('title').text</code></pre> <hr> The lxml package supports xpath. It seems to work pretty well, although I've had some trouble with the self:: axis. There's also Amara, but I haven't used it personally. Sounds like an lxml advertisement in here. ;) ElementTree is included in the std library. Under 2.6 and below its xpath is pretty weak, but in 2.7+ much improved: <pre class="prettyprint lang-py prettyprint-override"><code>import xml.etree.ElementTree as ET root = ET.parse(filename) result = '' for elem in root.findall('.//child/grandchild'): # How to make decisions based on attributes even in 2.6: if elem.attrib.get('name') == 'foo': result = elem.text break </code></pre> Use LXML. LXML uses the full power of libxml2 and libxslt, but wraps them in more "Pythonic" bindings than the Python bindings that are native to those libraries. As such, it gets the full XPath 1.0 implementation. Native ElemenTree supports a limited subset of XPath, although it may be good enough for your needs. Another option is py-dom-xpath, it works seamlessly with minidom and is pure Python so works on appengine. <pre class="prettyprint"><code>import xpath xpath.find('//item', doc) </code></pre>

How to use XPath in Python?

Tags:

xpath

libxml2 has a number of advantages:

Compliance to the spec
Active development and a community participation
Speed. This is really a python wrapper around a C implementation.
Ubiquity. The libxml2 library is pervasive and thus well tested.

Downsides include:

Compliance to the spec. It's strict. Things like default namespace handling are easier in other libraries.
Use of native code. This can be a pain depending on your how your application is distributed / deployed. RPMs are available that ease some of this pain.
Manual resource handling. Note in the sample below the calls to freeDoc() and xpathFreeContext(). This is not very Pythonic.

If you are doing simple path selection, stick with ElementTree ( which is included in Python 2.5 ). If you need full spec compliance or raw speed and can cope with the distribution of native code, go with libxml2.

Sample of libxml2 XPath Use

import libxml2

doc = libxml2.parseFile("tst.xml")
ctxt = doc.xpathNewContext()
res = ctxt.xpathEval("//*")
if len(res) != 2:
    print "xpath query: wrong node set size"
    sys.exit(1)
if res[0].name != "doc" or res[1].name != "foo":
    print "xpath query: wrong node set value"
    sys.exit(1)
doc.freeDoc()
ctxt.xpathFreeContext()

Sample of ElementTree XPath Use

from elementtree.ElementTree import ElementTree
mydoc = ElementTree(file='tst.xml')
for e in mydoc.findall('/foo/bar'):
    print e.get('title').text

The lxml package supports xpath. It seems to work pretty well, although I've had some trouble with the self:: axis. There's also Amara, but I haven't used it personally.

Sounds like an lxml advertisement in here. ;) ElementTree is included in the std library. Under 2.6 and below its xpath is pretty weak, but in 2.7+ much improved:

import xml.etree.ElementTree as ET
root = ET.parse(filename)
result = ''

for elem in root.findall('.//child/grandchild'):
    # How to make decisions based on attributes even in 2.6:
    if elem.attrib.get('name') == 'foo':
        result = elem.text
        break

Use LXML. LXML uses the full power of libxml2 and libxslt, but wraps them in more "Pythonic" bindings than the Python bindings that are native to those libraries. As such, it gets the full XPath 1.0 implementation. Native ElemenTree supports a limited subset of XPath, although it may be good enough for your needs.

Another option is py-dom-xpath, it works seamlessly with minidom and is pure Python so works on appengine.

import xpath
xpath.find('//item', doc)

Related questions
                            
                                How to open a file using the open with statement
                            
                                sqlite3.ProgrammingError: Incorrect number of bindings supplied. The current statement uses 1, and there are 74 supplied
                            
                                pyplot axes labels for subplots
                            
                                How do I load a file into the python console?
                            
                                Python app does not print anything when running detached in docker
                            
                                Django Server Error: port is already in use
                            
                                Clear variable in python
                            
                                When should I use uuid.uuid1() vs. uuid.uuid4() in python?
                            
                                Python argparse: default value or specified value
                            
                                How do you programmatically set an attribute?
                            
                                How to sort with lambda in Python
                            
                                TypeError: got multiple values for argument
                            
                                Override Python's 'in' operator?
                            
                                Check if a number is int or float
                            
                                Transposing a 1D NumPy array
                            
                                Format a datetime into a string with milliseconds
                            
                                How do I format a string using a dictionary in python-3.x?
                            
                                Controlling mouse with Python
                            
                                matplotlib Legend Markers Only Once
                            
                                Why does Python's hash of infinity have the digits of π?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use XPath in Python?

Tags:

python

dom

xml

python-2.x

xpath

Recent Activity

Donate For Us