Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use xml namespaces with find/findall in lxml?

I'm trying to parse content in an OpenOffice ODS spreadsheet. The ods format is essentially just a zipfile with a number of documents. The content of the spreadsheet is stored in 'content.xml'.

import zipfile from lxml import etree  zf = zipfile.ZipFile('spreadsheet.ods') root = etree.parse(zf.open('content.xml')) 

The content of the spreadsheet is in a cell:

table = root.find('.//{urn:oasis:names:tc:opendocument:xmlns:table:1.0}table') 

We can also go straight for the rows:

rows = root.findall('.//{urn:oasis:names:tc:opendocument:xmlns:table:1.0}table-row') 

The individual elements know about the namespaces:

>>> table.nsmap['table'] 'urn:oasis:names:tc:opendocument:xmlns:table:1.0' 

How do I use the namespaces directly in find/findall?

The obvious solution does not work.

Trying to get the rows from the table:

>>> root.findall('.//table:table') Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "lxml.etree.pyx", line 1792, in lxml.etree._ElementTree.findall (src/lxml/lxml.etree.c:41770)   File "lxml.etree.pyx", line 1297, in lxml.etree._Element.findall (src/lxml/lxml.etree.c:37027)   File "/usr/lib/python2.6/dist-packages/lxml/_elementpath.py", line 225, in findall     return list(iterfind(elem, path))   File "/usr/lib/python2.6/dist-packages/lxml/_elementpath.py", line 200, in iterfind     selector = _build_path_iterator(path)   File "/usr/lib/python2.6/dist-packages/lxml/_elementpath.py", line 184, in _build_path_iterator     selector.append(ops[token[0]](_next, token)) KeyError: ':' 
like image 916
saffsd Avatar asked Nov 18 '10 01:11

saffsd


People also ask

What are namespaces used for in XML?

An XML namespace is a collection of names that can be used as element or attribute names in an XML document. The namespace qualifies element names uniquely on the Web in order to avoid conflicts between elements with the same name.

What is namespace in @xmlelement?

XML namespaces provide a method for qualifying the names of XML elements and XML attributes in XML documents. A qualified name consists of a prefix and a local name, separated by a colon. The prefix functions only as a placeholder; it is mapped to a URI that specifies a namespace.

Can lxml parse HTML?

lxml provides a very simple and powerful API for parsing XML and HTML. It supports one-step parsing as well as step-by-step parsing using an event-driven API (currently only for XML).


1 Answers

If root.nsmap contains the table namespace prefix then you could:

root.xpath('.//table:table', namespaces=root.nsmap) 

findall(path) accepts {namespace}name syntax instead of namespace:name. Therefore path should be preprocessed using namespace dictionary to the {namespace}name form before passing it to findall().

like image 97
jfs Avatar answered Oct 15 '22 06:10

jfs