Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lxml element equality with namespaces

I am attempting to use Lxml to parse the contents of a .docx document. I understand that lxml replaces namespace prefixes with the actual namespace, however this makes it a real pain to check what kind of element tag I am working with. I would like to be able to do something like

if (someElement.tag == "w:p"):

but since lxml insists on prepending te ful namespace I'd either have to do something like

if (someElemenet.tag == "{http://schemas.openxmlformats.org/wordprocessingml/2006/main}p'):

or perform a lookup of the full namespace name from the element's nsmap attribute like this

targetTag = "{%s}p" % someElement.nsmap['w']
if (someElement.tag == targetTag):

If there were was an easier way to convince lxml to either

  1. Give me the tag string without the namespace appended to it, I can use the prefix attribute along with this information to check which tag I'm working with OR
  2. Just give me the tag string using the prefix

This would save a lot of keystrokes when writing this parser. Is this possible? Am I missing something in the documentation?

like image 208
mjn12 Avatar asked Mar 30 '11 23:03

mjn12


People also ask

What is Etree in lxml?

lxml. etree only returns real Elements, i.e. tree nodes that have a string tag name. Without a filter, both libraries iterate over all nodes. Note that currently only lxml. etree supports passing the Element factory function as filter to select only Elements.

Is lxml a parser?

lxml provides a very simple and powerful API for parsing XML and HTML. It supports one-step parsing as well as step-by-step parsing using an event-driven API (currently only for XML).

What is lxml library in Python?

lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers. This is when the lxml library comes to play.


1 Answers

Perhaps use local-name():

import lxml.etree as ET
tree = ET.fromstring('<root xmlns:f="foo"><f:test/></root>')
elt=tree[0]
print(elt.xpath('local-name()'))
# test
like image 176
unutbu Avatar answered Sep 22 '22 09:09

unutbu