I have a bunch of XML files which are using prefixes but without the corresponding namespace declaration.
Stuff like:
<tal:block tal:condition="foo">
...
</tal:block>
or:
<div i18n:domain="my-app">
...
I know where those prefixes come from, an I tried the following, but without success:
from lxml import etree as ElementTree
ElementTree.register_namespace("i18n", "http://namespaces.zope.org")
ElementTree.register_namespace("tal", "http://xml.zope.org/namespaces/tal")
with open(path) as fp:
tree = ElementTree.parse(fp)
but lxml still chokes with:
lxml.etree.XMLSyntaxError: Namespace prefix i18n for domain on div is not defined, line 4, column 20
I know I can use ElementTree.XMLParser(recover=True)
, but I would like to keep the prefix anyway, which this method don't.
Any idea?
It's not valid XML, using undefined prefixes, so no XML parser is going to be able to deal with it.
Your best bet (other than fixing the XML) is to programmaticly modify the XML source to add the namespace attributes to the root element (just using the string support in your language). Add xmlns:tal="http://xml.zope.org/namespaces/tal"
, etc to the root element before you give the XML to the parser. Then the XML parser should handle it without complaint and without any registering namespaces.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With