I'm building openoffice documents. I have a scaffold that I use to generate my content.xml file. The content-scaffold.xml file is stored in filesystem and looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
xmlns:anim="urn:oasis:names:tc:opendocument:xmlns:animation:1.0"
xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0"
xmlns:config="urn:oasis:names:tc:opendocument:xmlns:config:1.0"
xmlns:db="urn:oasis:names:tc:opendocument:xmlns:database:1.0"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0"
xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0"
xmlns:grddl="http://www.w3.org/2003/g/data-view#"
xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0"
xmlns:math="http://www.w3.org/1998/Math/MathML"
xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"
xmlns:odf="http://docs.oasis-open.org/ns/office/1.2/meta/odf#"
xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:pkg="http://docs.oasis-open.org/ns/office/1.2/meta/pkg#"
xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0"
xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0"
xmlns:smil="urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0"
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
xmlns:xforms="http://www.w3.org/2002/xforms"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xlink="http://www.w3.org/1999/xlink"
office:version="1.2">
<office:automatic-styles>
<style:style style:family="text" style:name="Strong">
<style:text-properties
fo:color="#000000"
fo:font-weight="bold" />
</style:style>
</office:automatic-styles>
<office:body>
<office:text>
<!-- content will go here -->
</office:text>
</office:body>
</office:document-content>
The idea is that I take this xml and inject stuff into the office:text tag (in python), then render it back. In this example, i'm injecting a simple text:p tag.
document_content = ElementTree.parse('content-scaffold.xml').getroot()
office_body = document_content.find('office:body', NAMESPACES)
office_text = office_body.find('office:text', NAMESPACES)
p = ElementTree.SubElement(office_text, 'text:p')
p.text = "Hello"
However, this is what the namespaces declarations look like once rendered:
<office:document-content
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
office:version="1.2">
This results in the following error:
Namespace prefix text on p is not defined
It's pretty clear that ElementTree is only keeping xmlns declarations that are needed (in my case fo, office and style since they are the only ones present in content-scaffold.xml), and it's pretty neat. However, I really want them all, in order to be able to use all namespaces.
Any idea how to force ElementTree to keep them all? Or am I thinking this wrong from the start? I'm open to any alternate solutions.
Note: I'm using Python 3 and ElementTree
Thanks
ElementTree is rather weak when it comes to namespace handling. However, what you are asking for can be done (but it is a bit of a hassle):
from xml.etree import ElementTree as ET
NAMESPACES = {"anim": "urn:oasis:names:tc:opendocument:xmlns:animation:1.0",
"chart": "urn:oasis:names:tc:opendocument:xmlns:chart:1.0",
"config": "urn:oasis:names:tc:opendocument:xmlns:config:1.0",
"db": "urn:oasis:names:tc:opendocument:xmlns:database:1.0",
"dc": "http://purl.org/dc/elements/1.1/",
"dr3d": "urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0",
"draw": "urn:oasis:names:tc:opendocument:xmlns:drawing:1.0",
"fo": "urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0",
"form": "urn:oasis:names:tc:opendocument:xmlns:form:1.0",
"grddl": "http://www.w3.org/2003/g/data-view#",
"manifest": "urn:oasis:names:tc:opendocument:xmlns:manifest:1.0",
"math": "http://www.w3.org/1998/Math/MathML",
"meta": "urn:oasis:names:tc:opendocument:xmlns:meta:1.0",
"number": "urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0",
"odf": "http://docs.oasis-open.org/ns/office/1.2/meta/odf#",
"of": "urn:oasis:names:tc:opendocument:xmlns:of:1.2",
"office": "urn:oasis:names:tc:opendocument:xmlns:office:1.0",
"pkg": "http://docs.oasis-open.org/ns/office/1.2/meta/pkg#",
"presentation": "urn:oasis:names:tc:opendocument:xmlns:presentation:1.0",
"script": "urn:oasis:names:tc:opendocument:xmlns:script:1.0",
"smil": "urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0",
"style": "urn:oasis:names:tc:opendocument:xmlns:style:1.0",
"svg": "urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0",
"table": "urn:oasis:names:tc:opendocument:xmlns:table:1.0",
"text": "urn:oasis:names:tc:opendocument:xmlns:text:1.0",
"xforms": "http://www.w3.org/2002/xforms",
"xhtml": "http://www.w3.org/1999/xhtml",
"xlink": "http://www.w3.org/1999/xlink"}
document_content = ET.parse('content-scaffold.xml').getroot()
office_body = document_content.find('office:body', NAMESPACES)
office_text = office_body.find('office:text', NAMESPACES)
p = ET.SubElement(office_text, 'text:p')
p.text = "Hello"
for prefix, uri in NAMESPACES.items():
ET.register_namespace(prefix, uri) # Ensure correct prefixes in output
if prefix not in ("office", "fo", "style"): # Prevent duplicate ns declarations
document_content.set("xmlns:" + prefix, uri) # Add ns declarations to root element
ET.ElementTree(document_content).write("output.xml")
This code will create a result document with all namespace declarations preserved.
Here is how it can be done with lxml:
from lxml import etree as ET
NAMESPACES = {"office": "urn:oasis:names:tc:opendocument:xmlns:office:1.0"}
document_content = ET.parse('content-scaffold.xml')
office_body = document_content.find('office:body', NAMESPACES)
office_text = office_body.find('office:text', NAMESPACES)
p = ET.SubElement(office_text, '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}p')
p.text = "Hello"
document_content.write("output.xml")
Note that you have to provide the element name using Clark notation in SubElement()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With