I'm trying to parse content in an OpenOffice ODS spreadsheet. The ods format is essentially just a zipfile with a number of documents. The content of the spreadsheet is stored in 'content.xml'. <pre class="prettyprint"><code>import zipfile from lxml import etree zf = zipfile.ZipFile('spreadsheet.ods') root = etree.parse(zf.open('content.xml')) </code></pre> The content of the spreadsheet is in a cell: <pre class="prettyprint"><code>table = root.find('.//{urn:oasis:names:tc:opendocument:xmlns:table:1.0}table') </code></pre> We can also go straight for the rows: <pre class="prettyprint"><code>rows = root.findall('.//{urn:oasis:names:tc:opendocument:xmlns:table:1.0}table-row') </code></pre> The individual elements know about the namespaces: <pre class="prettyprint"><code>>>> table.nsmap['table'] 'urn:oasis:names:tc:opendocument:xmlns:table:1.0' </code></pre> How do I use the namespaces directly in find/findall? The obvious solution does not work. Trying to get the rows from the table: <pre class="prettyprint"><code>>>> root.findall('.//table:table') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 1792, in lxml.etree._ElementTree.findall (src/lxml/lxml.etree.c:41770) File "lxml.etree.pyx", line 1297, in lxml.etree._Element.findall (src/lxml/lxml.etree.c:37027) File "/usr/lib/python2.6/dist-packages/lxml/_elementpath.py", line 225, in findall return list(iterfind(elem, path)) File "/usr/lib/python2.6/dist-packages/lxml/_elementpath.py", line 200, in iterfind selector = _build_path_iterator(path) File "/usr/lib/python2.6/dist-packages/lxml/_elementpath.py", line 184, in _build_path_iterator selector.append(ops[token[0]](_next, token)) KeyError: ':' </code></pre>

If <code>root.nsmap</code> contains the <code>table</code> namespace prefix then you could: <pre class="prettyprint"><code>root.xpath('.//table:table', namespaces=root.nsmap) </code></pre> <code>findall(path)</code> accepts <code>{namespace}name</code> syntax instead of <code>namespace:name</code>. Therefore <code>path</code> should be preprocessed using namespace dictionary to the <code>{namespace}name</code> form before passing it to <code>findall()</code>.

How do I use xml namespaces with find/findall in lxml?

Tags:

python

xml

xml-namespaces

lxml

elementtree

I'm trying to parse content in an OpenOffice ODS spreadsheet. The ods format is essentially just a zipfile with a number of documents. The content of the spreadsheet is stored in 'content.xml'.

import zipfile from lxml import etree  zf = zipfile.ZipFile('spreadsheet.ods') root = etree.parse(zf.open('content.xml'))

The content of the spreadsheet is in a cell:

table = root.find('.//{urn:oasis:names:tc:opendocument:xmlns:table:1.0}table')

We can also go straight for the rows:

rows = root.findall('.//{urn:oasis:names:tc:opendocument:xmlns:table:1.0}table-row')

The individual elements know about the namespaces:

>>> table.nsmap['table'] 'urn:oasis:names:tc:opendocument:xmlns:table:1.0'

How do I use the namespaces directly in find/findall?

The obvious solution does not work.

Trying to get the rows from the table:

>>> root.findall('.//table:table') Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "lxml.etree.pyx", line 1792, in lxml.etree._ElementTree.findall (src/lxml/lxml.etree.c:41770)   File "lxml.etree.pyx", line 1297, in lxml.etree._Element.findall (src/lxml/lxml.etree.c:37027)   File "/usr/lib/python2.6/dist-packages/lxml/_elementpath.py", line 225, in findall     return list(iterfind(elem, path))   File "/usr/lib/python2.6/dist-packages/lxml/_elementpath.py", line 200, in iterfind     selector = _build_path_iterator(path)   File "/usr/lib/python2.6/dist-packages/lxml/_elementpath.py", line 184, in _build_path_iterator     selector.append(ops[token[0]](_next, token)) KeyError: ':'

916

asked Nov 18 '10 01:11

saffsd

1 Answers

If root.nsmap contains the table namespace prefix then you could:

root.xpath('.//table:table', namespaces=root.nsmap)

findall(path) accepts {namespace}name syntax instead of namespace:name. Therefore path should be preprocessed using namespace dictionary to the {namespace}name form before passing it to findall().

answered Oct 15 '22 06:10

jfs

Related questions
                            
                                Creating graph with date and time in axis labels with matplotlib
                            
                                Django admin hangs (until timeout error) for a specific model when trying to edit/create
                            
                                Setting LD_LIBRARY_PATH from inside Python
                            
                                Django ORM, group by day
                            
                                Rendering a dictionary in Jinja2
                            
                                Cassandra: File "cqlsh", line 95 except ImportError, e:
                            
                                How to get single value from dict with single entry?
                            
                                Specific reasons to favor pip vs. conda when installing Python packages
                            
                                How do I re-map python dict keys
                            
                                Can you make a python subprocess output stdout and stderr as usual, but also capture the output as a string? [duplicate]
                            
                                Using ^ to match beginning of line in Python regex
                            
                                Printing the loss during TensorFlow training
                            
                                Reading a pickle file (PANDAS Python Data Frame) in R
                            
                                python equivalent of functools 'partial' for a class / constructor
                            
                                How to use await in a python lambda
                            
                                Google Colab is very slow compared to my PC
                            
                                How to mock os.walk in python with a temporary filesystem?
                            
                                Loading model with custom loss + keras
                            
                                WARNING: Ignoring invalid distribution -ip (c:\python39\lib\site-packages) How do I fix this and what does it mean? [duplicate]
                            
                                What is the way data is stored in *.npy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With