Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing XML with XPath in Python 3

Tags:

python

xml

I have the following xml:

<document>
  <internal-code code="201">
    <internal-desc>Biscuits Wrapped</internal-desc>
    <top-grouping>Finished</top-grouping>
    <web-category>Biscuits</web-category>
    <web-sub-category>Biscuits (Wrapped)</web-sub-category>
  </internal-code>
  <internal-code code="202">
    <internal-desc>Biscuits Sweet</internal-desc>
    <top-grouping>Finished</top-grouping>
    <web-category>Biscuits</web-category>
    <web-sub-category>Biscuits (Sweets)</web-sub-category>
  </internal-code>
  <internal-code code="221">
    <internal-desc>Biscuits Savoury</internal-desc>
    <top-grouping>Finished</top-grouping>
    <web-category>Biscuits</web-category>
    <web-sub-category>Biscuits For Cheese</web-sub-category>
  </internal-code>
  ....
</document>

I have loaded it into a tree using this code:

try:
  groups = etree.parse(PRODUCT_GROUPS_XML_FILEPATH)
  root = groups.getroot()
  internalGroup = root.findall("./internal-code")
  LOG.append("[INFO] product groupings file loaded and parsed ok")
except Exception as e:
  LOG.append("[ERROR] PRODUCT GROUPINGS XML FILE ACCESS PROBLEM")
  LOG.append("[***TERMINATED***]")
  writelog()
  exit()

I would like to use XPath to find the correct and then be able to access the child nodes of that group. So if I am searching for internal-code 221 and want web-category I would do something like:

internalGroup.find("internal-code", 221).get("web-category").text

I am not experienced with XML and Python and I have been staring at this for ages. All help very gratefully received. Thanks

like image 329
Zuriar Avatar asked Feb 07 '14 12:02

Zuriar


People also ask

How use XPath XML in Python?

To find the XPath for a particular element on a page:Right-click the element in the page and click on Inspect. Right click on the element in the Elements Tab. Click on copy XPath.

How do you parse an XML string in Python?

3.2 Parsing an XML String We use the ElementTree. fromstring() method to parse an XML string. The method returns root Element directly: a subtle difference compared with the ElementTree. parse() method which returns an ElementTree object.

What is parsing XML with Python?

Python XML Parsing Modules Python allows parsing these XML documents using two modules namely, the xml. etree. ElementTree module and Minidom (Minimal DOM Implementation). Parsing means to read information from a file and split it into pieces by identifying parts of that particular XML file.


1 Answers

According to xml.etree.ElementTree documentation:

XPath support

This module provides limited support for XPath expressions for locating elements in a tree. The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the module.

Use lxml:

>>> import lxml.etree as ET
>>>
>>> s = '''
... <document>
...   <internal-code code="201">
...     <internal-desc>Biscuits Wrapped</internal-desc>
...     <top-grouping>Finished</top-grouping>
...     <web-category>Biscuits</web-category>
...     <web-sub-category>Biscuits (Wrapped)</web-sub-category>
...   </internal-code>
...   <internal-code code="202">
...     <internal-desc>Biscuits Sweet</internal-desc>
...     <top-grouping>Finished</top-grouping>
...     <web-category>Biscuits</web-category>
...     <web-sub-category>Biscuits (Sweets)</web-sub-category>
...   </internal-code>
...   <internal-code code="221">
...     <internal-desc>Biscuits Savoury</internal-desc>
...     <top-grouping>Finished</top-grouping>
...     <web-category>Biscuits</web-category>
...     <web-sub-category>Biscuits For Cheese</web-sub-category>
...   </internal-code>
... </document>
... '''
>>>
>>> root = ET.fromstring(s)
>>> for text in root.xpath('.//internal-code[@code="221"]/web-category/text()'):
...     print(text)
...
Biscuits
like image 69
falsetru Avatar answered Sep 28 '22 19:09

falsetru