Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElementTree: why are my namespace declarations stripped out?

I'm building openoffice documents. I have a scaffold that I use to generate my content.xml file. The content-scaffold.xml file is stored in filesystem and looks like this:

<?xml version="1.0" encoding="UTF-8"?>
  <office:document-content
  xmlns:anim="urn:oasis:names:tc:opendocument:xmlns:animation:1.0"
  xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0"
  xmlns:config="urn:oasis:names:tc:opendocument:xmlns:config:1.0"
  xmlns:db="urn:oasis:names:tc:opendocument:xmlns:database:1.0"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0"
  xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
  xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
  xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0"
  xmlns:grddl="http://www.w3.org/2003/g/data-view#"
  xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0"
  xmlns:math="http://www.w3.org/1998/Math/MathML"
  xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
  xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"
  xmlns:odf="http://docs.oasis-open.org/ns/office/1.2/meta/odf#"
  xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2"
  xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
  xmlns:pkg="http://docs.oasis-open.org/ns/office/1.2/meta/pkg#"
  xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0"
  xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0"
  xmlns:smil="urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0"
  xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
  xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
  xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
  xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
  xmlns:xforms="http://www.w3.org/2002/xforms"
  xmlns:xhtml="http://www.w3.org/1999/xhtml"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  office:version="1.2">

  <office:automatic-styles>

    <style:style style:family="text" style:name="Strong">
      <style:text-properties
        fo:color="#000000"
        fo:font-weight="bold" />
    </style:style>

  </office:automatic-styles>


  <office:body>
    <office:text>
      <!-- content will go here -->
    </office:text>
  </office:body>

</office:document-content>

The idea is that I take this xml and inject stuff into the office:text tag (in python), then render it back. In this example, i'm injecting a simple text:p tag.

document_content = ElementTree.parse('content-scaffold.xml').getroot()
office_body = document_content.find('office:body', NAMESPACES)
office_text = office_body.find('office:text', NAMESPACES)
p = ElementTree.SubElement(office_text, 'text:p')
p.text = "Hello"

However, this is what the namespaces declarations look like once rendered:

<office:document-content 
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" 
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
office:version="1.2">

This results in the following error:

Namespace prefix text on p is not defined

It's pretty clear that ElementTree is only keeping xmlns declarations that are needed (in my case fo, office and style since they are the only ones present in content-scaffold.xml), and it's pretty neat. However, I really want them all, in order to be able to use all namespaces.

Any idea how to force ElementTree to keep them all? Or am I thinking this wrong from the start? I'm open to any alternate solutions.

Note: I'm using Python 3 and ElementTree

Thanks

like image 255
gkpo Avatar asked Oct 30 '22 13:10

gkpo


1 Answers

ElementTree is rather weak when it comes to namespace handling. However, what you are asking for can be done (but it is a bit of a hassle):

from xml.etree import ElementTree as ET

NAMESPACES = {"anim": "urn:oasis:names:tc:opendocument:xmlns:animation:1.0",
  "chart": "urn:oasis:names:tc:opendocument:xmlns:chart:1.0",
  "config": "urn:oasis:names:tc:opendocument:xmlns:config:1.0",
  "db": "urn:oasis:names:tc:opendocument:xmlns:database:1.0",
  "dc": "http://purl.org/dc/elements/1.1/",
  "dr3d": "urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0",
  "draw": "urn:oasis:names:tc:opendocument:xmlns:drawing:1.0",
  "fo": "urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0",
  "form": "urn:oasis:names:tc:opendocument:xmlns:form:1.0",
  "grddl": "http://www.w3.org/2003/g/data-view#",
  "manifest": "urn:oasis:names:tc:opendocument:xmlns:manifest:1.0",
  "math": "http://www.w3.org/1998/Math/MathML",
  "meta": "urn:oasis:names:tc:opendocument:xmlns:meta:1.0",
  "number": "urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0",
  "odf": "http://docs.oasis-open.org/ns/office/1.2/meta/odf#",
  "of": "urn:oasis:names:tc:opendocument:xmlns:of:1.2",
  "office": "urn:oasis:names:tc:opendocument:xmlns:office:1.0",
  "pkg": "http://docs.oasis-open.org/ns/office/1.2/meta/pkg#",
  "presentation": "urn:oasis:names:tc:opendocument:xmlns:presentation:1.0",
  "script": "urn:oasis:names:tc:opendocument:xmlns:script:1.0",
  "smil": "urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0",
  "style": "urn:oasis:names:tc:opendocument:xmlns:style:1.0",
  "svg": "urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0",
  "table": "urn:oasis:names:tc:opendocument:xmlns:table:1.0",
  "text": "urn:oasis:names:tc:opendocument:xmlns:text:1.0",
  "xforms": "http://www.w3.org/2002/xforms",
  "xhtml": "http://www.w3.org/1999/xhtml",
  "xlink": "http://www.w3.org/1999/xlink"}

document_content = ET.parse('content-scaffold.xml').getroot()
office_body = document_content.find('office:body',  NAMESPACES)
office_text = office_body.find('office:text', NAMESPACES)
p = ET.SubElement(office_text, 'text:p')
p.text = "Hello"

for prefix, uri in NAMESPACES.items():
    ET.register_namespace(prefix, uri)           # Ensure correct prefixes in output 
    if prefix not in ("office", "fo", "style"):  # Prevent duplicate ns declarations
        document_content.set("xmlns:" + prefix, uri)   # Add ns declarations to root element

ET.ElementTree(document_content).write("output.xml")

This code will create a result document with all namespace declarations preserved.


Here is how it can be done with lxml:

from lxml import etree as ET

NAMESPACES = {"office": "urn:oasis:names:tc:opendocument:xmlns:office:1.0"}

document_content = ET.parse('content-scaffold.xml')
office_body = document_content.find('office:body', NAMESPACES)
office_text = office_body.find('office:text', NAMESPACES)
p = ET.SubElement(office_text, '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}p')
p.text = "Hello"

document_content.write("output.xml")

Note that you have to provide the element name using Clark notation in SubElement().

like image 57
mzjn Avatar answered Nov 15 '22 05:11

mzjn