I am attempting to use Lxml to parse the contents of a .docx document. I understand that lxml replaces namespace prefixes with the actual namespace, however this makes it a real pain to check what kind of element tag I am working with. I would like to be able to do something like <pre class="prettyprint"><code>if (someElement.tag == "w:p"): </code></pre> but since lxml insists on prepending te ful namespace I'd either have to do something like <pre class="prettyprint"><code>if (someElemenet.tag == "{http://schemas.openxmlformats.org/wordprocessingml/2006/main}p'): </code></pre> or perform a lookup of the full namespace name from the element's nsmap attribute like this <pre class="prettyprint"><code>targetTag = "{%s}p" % someElement.nsmap['w'] if (someElement.tag == targetTag): </code></pre> If there were was an easier way to convince lxml to either <ol> <li>Give me the tag string without the namespace appended to it, I can use the prefix attribute along with this information to check which tag I'm working with OR</li> <li>Just give me the tag string using the prefix</li> </ol> This would save a lot of keystrokes when writing this parser. Is this possible? Am I missing something in the documentation?

Perhaps use local-name(): <pre class="prettyprint"><code>import lxml.etree as ET tree = ET.fromstring('<root xmlns:f="foo"><f:test/></root>') elt=tree[0] print(elt.xpath('local-name()')) # test </code></pre>

Lxml element equality with namespaces

Tags:

python

xml-namespaces

lxml

I am attempting to use Lxml to parse the contents of a .docx document. I understand that lxml replaces namespace prefixes with the actual namespace, however this makes it a real pain to check what kind of element tag I am working with. I would like to be able to do something like

if (someElement.tag == "w:p"):

but since lxml insists on prepending te ful namespace I'd either have to do something like

if (someElemenet.tag == "{http://schemas.openxmlformats.org/wordprocessingml/2006/main}p'):

or perform a lookup of the full namespace name from the element's nsmap attribute like this

targetTag = "{%s}p" % someElement.nsmap['w']
if (someElement.tag == targetTag):

If there were was an easier way to convince lxml to either

Give me the tag string without the namespace appended to it, I can use the prefix attribute along with this information to check which tag I'm working with OR
Just give me the tag string using the prefix

This would save a lot of keystrokes when writing this parser. Is this possible? Am I missing something in the documentation?

208

asked Mar 30 '11 23:03

mjn12

1 Answers

Perhaps use local-name():

import lxml.etree as ET
tree = ET.fromstring('<root xmlns:f="foo"><f:test/></root>')
elt=tree[0]
print(elt.xpath('local-name()'))
# test

176

answered Sep 22 '22 09:09

unutbu

Related questions
                            
                                How to check if a string contains only characters from a given set in python
                            
                                Python Terminal/Text UI (TUI) library [closed]
                            
                                Displaying a Youtube clip in python
                            
                                How to fade color
                            
                                display an octal value as its string representation
                            
                                Better way to remove multiple words from a string?
                            
                                Subtracting n days from date in Python [duplicate]
                            
                                Install wxPython in osx 10.11
                            
                                How can I download just thumbnails using youtube-dl?
                            
                                ImportError: cannot import name 'MNIST'
                            
                                How to add sender name before sender address in python email script
                            
                                How can I compile Python 3.6.2 on macOS with openSSL from homebrew?
                            
                                Get All Unique Values in a Row
                            
                                ssl.get_server_certificate for sites with SNI (Server Name Indication)
                            
                                Pip cannot find metadata file - EnvironmentError
                            
                                Display grouped list of items from python list cyclically
                            
                                Scripting in Java [closed]
                            
                                Where is Python used? I read about it a lot on Reddit
                            
                                C++ Structure within itself?
                            
                                Get the number of occurrences of each character

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With