I'm trying to parse content in an OpenOffice ODS spreadsheet. The ods format is essentially just a zipfile with a number of documents. The content of the spreadsheet is stored in 'content.xml'.
import zipfile from lxml import etree zf = zipfile.ZipFile('spreadsheet.ods') root = etree.parse(zf.open('content.xml'))
The content of the spreadsheet is in a cell:
table = root.find('.//{urn:oasis:names:tc:opendocument:xmlns:table:1.0}table')
We can also go straight for the rows:
rows = root.findall('.//{urn:oasis:names:tc:opendocument:xmlns:table:1.0}table-row')
The individual elements know about the namespaces:
>>> table.nsmap['table'] 'urn:oasis:names:tc:opendocument:xmlns:table:1.0'
How do I use the namespaces directly in find/findall?
The obvious solution does not work.
Trying to get the rows from the table:
>>> root.findall('.//table:table') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 1792, in lxml.etree._ElementTree.findall (src/lxml/lxml.etree.c:41770) File "lxml.etree.pyx", line 1297, in lxml.etree._Element.findall (src/lxml/lxml.etree.c:37027) File "/usr/lib/python2.6/dist-packages/lxml/_elementpath.py", line 225, in findall return list(iterfind(elem, path)) File "/usr/lib/python2.6/dist-packages/lxml/_elementpath.py", line 200, in iterfind selector = _build_path_iterator(path) File "/usr/lib/python2.6/dist-packages/lxml/_elementpath.py", line 184, in _build_path_iterator selector.append(ops[token[0]](_next, token)) KeyError: ':'
An XML namespace is a collection of names that can be used as element or attribute names in an XML document. The namespace qualifies element names uniquely on the Web in order to avoid conflicts between elements with the same name.
XML namespaces provide a method for qualifying the names of XML elements and XML attributes in XML documents. A qualified name consists of a prefix and a local name, separated by a colon. The prefix functions only as a placeholder; it is mapped to a URI that specifies a namespace.
lxml provides a very simple and powerful API for parsing XML and HTML. It supports one-step parsing as well as step-by-step parsing using an event-driven API (currently only for XML).
If root.nsmap
contains the table
namespace prefix then you could:
root.xpath('.//table:table', namespaces=root.nsmap)
findall(path)
accepts {namespace}name
syntax instead of namespace:name
. Therefore path
should be preprocessed using namespace dictionary to the {namespace}name
form before passing it to findall()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With