I have the following xml: <pre class="prettyprint"><code><document> <internal-code code="201"> <internal-desc>Biscuits Wrapped</internal-desc> <top-grouping>Finished</top-grouping> <web-category>Biscuits</web-category> <web-sub-category>Biscuits (Wrapped)</web-sub-category> </internal-code> <internal-code code="202"> <internal-desc>Biscuits Sweet</internal-desc> <top-grouping>Finished</top-grouping> <web-category>Biscuits</web-category> <web-sub-category>Biscuits (Sweets)</web-sub-category> </internal-code> <internal-code code="221"> <internal-desc>Biscuits Savoury</internal-desc> <top-grouping>Finished</top-grouping> <web-category>Biscuits</web-category> <web-sub-category>Biscuits For Cheese</web-sub-category> </internal-code> .... </document> </code></pre> I have loaded it into a tree using this code: <pre class="prettyprint"><code>try: groups = etree.parse(PRODUCT_GROUPS_XML_FILEPATH) root = groups.getroot() internalGroup = root.findall("./internal-code") LOG.append("[INFO] product groupings file loaded and parsed ok") except Exception as e: LOG.append("[ERROR] PRODUCT GROUPINGS XML FILE ACCESS PROBLEM") LOG.append("[***TERMINATED***]") writelog() exit() </code></pre> I would like to use XPath to find the correct and then be able to access the child nodes of that group. So if I am searching for internal-code 221 and want web-category I would do something like: <pre class="prettyprint"><code>internalGroup.find("internal-code", 221).get("web-category").text </code></pre> I am not experienced with XML and Python and I have been staring at this for ages. All help very gratefully received. Thanks

According to <code>xml.etree.ElementTree</code> documentation: <blockquote> <h3>XPath support</h3> This module provides limited support for XPath expressions for locating elements in a tree. The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the module. </blockquote> Use <code>lxml</code>: <pre class="prettyprint"><code>>>> import lxml.etree as ET >>> >>> s = ''' ... <document> ... <internal-code code="201"> ... <internal-desc>Biscuits Wrapped</internal-desc> ... <top-grouping>Finished</top-grouping> ... <web-category>Biscuits</web-category> ... <web-sub-category>Biscuits (Wrapped)</web-sub-category> ... </internal-code> ... <internal-code code="202"> ... <internal-desc>Biscuits Sweet</internal-desc> ... <top-grouping>Finished</top-grouping> ... <web-category>Biscuits</web-category> ... <web-sub-category>Biscuits (Sweets)</web-sub-category> ... </internal-code> ... <internal-code code="221"> ... <internal-desc>Biscuits Savoury</internal-desc> ... <top-grouping>Finished</top-grouping> ... <web-category>Biscuits</web-category> ... <web-sub-category>Biscuits For Cheese</web-sub-category> ... </internal-code> ... </document> ... ''' >>> >>> root = ET.fromstring(s) >>> for text in root.xpath('.//internal-code[@code="221"]/web-category/text()'): ... print(text) ... Biscuits </code></pre>

Parsing XML with XPath in Python 3

Tags:

python

xml

I have the following xml:

<document>
  <internal-code code="201">
    <internal-desc>Biscuits Wrapped</internal-desc>
    <top-grouping>Finished</top-grouping>
    <web-category>Biscuits</web-category>
    <web-sub-category>Biscuits (Wrapped)</web-sub-category>
  </internal-code>
  <internal-code code="202">
    <internal-desc>Biscuits Sweet</internal-desc>
    <top-grouping>Finished</top-grouping>
    <web-category>Biscuits</web-category>
    <web-sub-category>Biscuits (Sweets)</web-sub-category>
  </internal-code>
  <internal-code code="221">
    <internal-desc>Biscuits Savoury</internal-desc>
    <top-grouping>Finished</top-grouping>
    <web-category>Biscuits</web-category>
    <web-sub-category>Biscuits For Cheese</web-sub-category>
  </internal-code>
  ....
</document>

I have loaded it into a tree using this code:

try:
  groups = etree.parse(PRODUCT_GROUPS_XML_FILEPATH)
  root = groups.getroot()
  internalGroup = root.findall("./internal-code")
  LOG.append("[INFO] product groupings file loaded and parsed ok")
except Exception as e:
  LOG.append("[ERROR] PRODUCT GROUPINGS XML FILE ACCESS PROBLEM")
  LOG.append("[***TERMINATED***]")
  writelog()
  exit()

I would like to use XPath to find the correct and then be able to access the child nodes of that group. So if I am searching for internal-code 221 and want web-category I would do something like:

internalGroup.find("internal-code", 221).get("web-category").text

I am not experienced with XML and Python and I have been staring at this for ages. All help very gratefully received. Thanks

329

asked Feb 07 '14 12:02

Zuriar

1 Answers

According to xml.etree.ElementTree documentation:

XPath support

This module provides limited support for XPath expressions for locating elements in a tree. The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the module.

Use lxml:

>>> import lxml.etree as ET
>>>
>>> s = '''
... <document>
...   <internal-code code="201">
...     <internal-desc>Biscuits Wrapped</internal-desc>
...     <top-grouping>Finished</top-grouping>
...     <web-category>Biscuits</web-category>
...     <web-sub-category>Biscuits (Wrapped)</web-sub-category>
...   </internal-code>
...   <internal-code code="202">
...     <internal-desc>Biscuits Sweet</internal-desc>
...     <top-grouping>Finished</top-grouping>
...     <web-category>Biscuits</web-category>
...     <web-sub-category>Biscuits (Sweets)</web-sub-category>
...   </internal-code>
...   <internal-code code="221">
...     <internal-desc>Biscuits Savoury</internal-desc>
...     <top-grouping>Finished</top-grouping>
...     <web-category>Biscuits</web-category>
...     <web-sub-category>Biscuits For Cheese</web-sub-category>
...   </internal-code>
... </document>
... '''
>>>
>>> root = ET.fromstring(s)
>>> for text in root.xpath('.//internal-code[@code="221"]/web-category/text()'):
...     print(text)
...
Biscuits

answered Sep 28 '22 19:09

falsetru

Related questions
                            
                                How to share pandas DataFrame object between processes?
                            
                                Inconsistence among built-in types and user defined
                            
                                Python multiprocessing Pool recovery after "Resource temporarily unavailable"
                            
                                Using pynids on multiple pcaps
                            
                                dificulty solving a code in O(logn)
                            
                                Python module: how to prevent importing modules called by the new module
                            
                                Getting groups from LDAP to django
                            
                                Fast Numerical Integration in Python
                            
                                PyCharm & IronPython Codecompletion?
                            
                                pyside qtreewidget constrain drag and drop
                            
                                ATOMIC_REQUEST and Transactions in Django 1.6
                            
                                Local and heroku db get out of sync while migrating using alembic
                            
                                VirtualEnv/Pip trying to install packages globally
                            
                                SystemExit: 2 error when calling parse_args() in iPython Notebook
                            
                                How to set environment variables in travis-ci and access them from python script?
                            
                                Is there a python version of node-webkit
                            
                                Connect points with same value in python matplotlib
                            
                                Need to read specific range of text file in Python
                            
                                dtype mismatch in sklearn on k-means
                            
                                Pandas: Unstacking One Column of a DataFrame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With