I have the following xml:
<document>
<internal-code code="201">
<internal-desc>Biscuits Wrapped</internal-desc>
<top-grouping>Finished</top-grouping>
<web-category>Biscuits</web-category>
<web-sub-category>Biscuits (Wrapped)</web-sub-category>
</internal-code>
<internal-code code="202">
<internal-desc>Biscuits Sweet</internal-desc>
<top-grouping>Finished</top-grouping>
<web-category>Biscuits</web-category>
<web-sub-category>Biscuits (Sweets)</web-sub-category>
</internal-code>
<internal-code code="221">
<internal-desc>Biscuits Savoury</internal-desc>
<top-grouping>Finished</top-grouping>
<web-category>Biscuits</web-category>
<web-sub-category>Biscuits For Cheese</web-sub-category>
</internal-code>
....
</document>
I have loaded it into a tree using this code:
try:
groups = etree.parse(PRODUCT_GROUPS_XML_FILEPATH)
root = groups.getroot()
internalGroup = root.findall("./internal-code")
LOG.append("[INFO] product groupings file loaded and parsed ok")
except Exception as e:
LOG.append("[ERROR] PRODUCT GROUPINGS XML FILE ACCESS PROBLEM")
LOG.append("[***TERMINATED***]")
writelog()
exit()
I would like to use XPath to find the correct and then be able to access the child nodes of that group. So if I am searching for internal-code 221 and want web-category I would do something like:
internalGroup.find("internal-code", 221).get("web-category").text
I am not experienced with XML and Python and I have been staring at this for ages. All help very gratefully received. Thanks
To find the XPath for a particular element on a page:Right-click the element in the page and click on Inspect. Right click on the element in the Elements Tab. Click on copy XPath.
3.2 Parsing an XML String We use the ElementTree. fromstring() method to parse an XML string. The method returns root Element directly: a subtle difference compared with the ElementTree. parse() method which returns an ElementTree object.
Python XML Parsing Modules Python allows parsing these XML documents using two modules namely, the xml. etree. ElementTree module and Minidom (Minimal DOM Implementation). Parsing means to read information from a file and split it into pieces by identifying parts of that particular XML file.
According to xml.etree.ElementTree
documentation:
XPath support
This module provides limited support for XPath expressions for locating elements in a tree. The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the module.
Use lxml
:
>>> import lxml.etree as ET
>>>
>>> s = '''
... <document>
... <internal-code code="201">
... <internal-desc>Biscuits Wrapped</internal-desc>
... <top-grouping>Finished</top-grouping>
... <web-category>Biscuits</web-category>
... <web-sub-category>Biscuits (Wrapped)</web-sub-category>
... </internal-code>
... <internal-code code="202">
... <internal-desc>Biscuits Sweet</internal-desc>
... <top-grouping>Finished</top-grouping>
... <web-category>Biscuits</web-category>
... <web-sub-category>Biscuits (Sweets)</web-sub-category>
... </internal-code>
... <internal-code code="221">
... <internal-desc>Biscuits Savoury</internal-desc>
... <top-grouping>Finished</top-grouping>
... <web-category>Biscuits</web-category>
... <web-sub-category>Biscuits For Cheese</web-sub-category>
... </internal-code>
... </document>
... '''
>>>
>>> root = ET.fromstring(s)
>>> for text in root.xpath('.//internal-code[@code="221"]/web-category/text()'):
... print(text)
...
Biscuits
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With