My XML looks like:
...
<termEntry id="c1">
<langSet xml:lang="de">
...
And i have the code:
from lxml import etree
...
for term_entry in root.iterfind('.//termEntry'):
print term_entry.attrib['id']
print term_entry.nsmap
for lang_set in term_entry.iterfind('langSet'):
print lang_set.nsmap
print lang_set.attrib
for some_stuff in lang_set.iterfind('some_stuff'):
...
I get the empty nsmap dict, and my attrib dict looks like {'{http://www.w3.org/XML/1998/namespace}lang': 'en'}
The file may not contain xml:
in namespace, or it may have a different namespace. How can i know what namespace used in the tag declaration? In fact, i just need to get a lang
attribute, i don't care what namespace was used. I don't want use any crappy trash like lang_set.attrib.values()[0]
or other lookups of a field with the known name.
i just need to get a
lang
attribute, i don't care what namespace was used
Your question is not very clear and you haven't provided any complete runnable code example. But doing some string manipulation as suggested by @mmgp in a comment may be enough.
However, xml:lang
is not the same as random_prefix:lang
(or just lang
). I think you should care about the namespace. If the objective is to identify the natural language that applies to an element's content, then you should be using xml:lang
(because that is the explicit purpose of this attribute; see http://www.w3.org/TR/REC-xml/#sec-lang-tag).
I just want to know where is stored the
{http://www.w3.org/XML/1998/namespace}
string for attributes.
It is important to know that the xml
prefix is special. It is reserved (as opposed to almost all other namespace prefixes which are supposed to be arbitrary) and defined to be bound to http://www.w3.org/XML/1998/namespace
.
From the Namespaces in XML 1.0 W3C recommendation:
The prefix xml is by definition bound to the namespace name
http://www.w3.org/XML/1998/namespace
. It MAY, but need not, be declared, and MUST NOT be bound to any other namespace name. Other prefixes MUST NOT be bound to this namespace name, and it MUST NOT be declared as the default namespace.
Other uses of the xml
prefix are the xml:space
and xml:base
attributes.
It is really strange, if lxml does not provide any method for namespace processing
lxml processes namespaces just fine, but prefixes are avoided as much as possible. You will need to use the http://www.w3.org/XML/1998/namespace
namespace name when doing lookups that involve the xml
prefix.
you could simply use xpath:
lang_set.xpath('./@xml:lang')[0]
by the way, are you working with TBX files?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With