Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does xml package modify my xml file in Python3?

I use the xml library in Python3.5 for reading and writing an xml-file. I don't modify the file. Just open and write. But the library modifes the file.

  1. Why is it modified?
  2. How can I prevent this? e.g. I just want to replace specific tag or it's value in a quite complex xml-file without loosing any other informations.

This is the example file

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<movie>
    <title>Der Eisbär</title>
    <ids>
        <entry>
            <key>tmdb</key>
            <value xsi:type="xs:int" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">9321</value>
        </entry>
        <entry>
            <key>imdb</key>
            <value xsi:type="xs:string" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">tt0167132</value>
        </entry>
    </ids>
</movie>

This is the code

import xml.etree.ElementTree as ET
tree = ET.parse('x.nfo')
tree.write('y.nfo', encoding='utf-8')

And the xml-file becomes this

<movie xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <title>Der Eisbär</title>
    <ids>
        <entry>
            <key>tmdb</key>
            <value xsi:type="xs:int">9321</value>
        </entry>
        <entry>
            <key>imdb</key>
            <value xsi:type="xs:string">tt0167132</value>
        </entry>
    </ids>
</movie>
  • Line 1 is gone.
  • The <movie>-tag in line 2 has attributes now.
  • The <value>-tag in line 7 and 11 now has less attributes.
like image 731
buhtz Avatar asked Aug 31 '17 22:08

buhtz


People also ask

How to modify XML files with Python?

Modify XML files with Python 1 ET.parse (‘Filename’).getroot () -ET.parse (‘fname’)-creates a tree and then we extract the root by .getroot (). 2 ET.fromstring (stringname) -To create a root from an XML data string. More ...

What does it mean to parse XML in Python?

Parsing an XML in Python means loading an XML file or string to a python object, to be able to work with it using pythonic functions. Say you have an ElementTree object as ET. Here are the built-in functions and methods of the element tree for parsing.

How to parse XML file in Python using elementtree?

ElementTree is a class that wraps the element structure and allows conversion to and from XML. Let us now try to parse the above XML file using the python module. There are two ways to parse the file using ‘ElementTree’ module. The first is by using the parse () function and the second is fromstring () function.

How to modify the XML document using element methods?

# parsing from the string. # printing attributes of the root tags 'neighbor'. # finding the state tag and their child attributes. Element methods output. Modifying the XML document can also be done through Element methods. 1) Element.set (‘attrname’, ‘value’) – Modifying element attributes.


1 Answers

Note that "xml package" and "the xml library" are ambiguous. There are several XML-related modules in the standard library: https://docs.python.org/3/library/xml.html.

Why is it modified?

ElementTree moves namespace declarations to the root element, and namespaces that aren't actually used in the document are removed.

Why does ElementTree do this? I don't know, but perhaps it is a way to make the implementation simpler.

How can I prevent this? e.g. I just want to replace specific tag or it's value in a quite complex xml-file without loosing any other informations.

I don't think there is a way to prevent this. The issue has been brought up before. Here are two very similar questions with no answers:

  • How do I parse and write XML using Python's ElementTree without moving namespaces around?
  • Keep Existing Namespaces when overwriting XML file with ElementTree and Python

My suggestion is to use lxml instead of ElementTree. With lxml, the namespace declarations will remain where they occur in the original file.

Line 1 is gone.

That line is the XML declaration. It is recommended but not mandatory to have one.

If you always want an XML declaration, use xml_declaration=True in the write() method call.

like image 186
mzjn Avatar answered Sep 28 '22 04:09

mzjn