So basically, I want to generate an XML with elements generated from data in a python dictionary, where what will come to be tags are the dictionary's keys, and the text the dictionary's values. I have no need to give attributes to the items, and my desired output would look something like this:
<AllItems>
<Item>
<some_tag> Hello World </some_tag>
...
<another_tag />
</Item>
<Item> ... </Item>
...
</AllItems>
I have tried using the xml.etree.ElementTree package, by creating a tree, setting an Element "AllItems" as the root like so:
from xml.etree import ElementTree as et
def dict_to_elem(dictionary):
item = et.Element('Item')
for key in dictionary:
field = et.Element(key.replace(' ',''))
field.text = dictionary[key]
item.append(field)
return item
newtree = et.ElementTree()
root = et.Element('AllItems')
newtree._setroot(root)
root.append(dict_to_elem( {'some_tag':'Hello World', ...} )
# Lather, rinse, repeat this append step as needed
with open( filename , 'w', encoding='utf-8') as file:
tree.write(file, encoding='unicode')
In the last two lines, I have tried omitting the encoding in the open() statement, omitting and changing to 'UTF-8' the encoding in the write() method, and I either get an error that "') is type str is not serializable
So my problem - All I want to know is how should I be going about creating a UTF-8 XML from scratch with the format above, and is there a more robust solution using another package, that will properly allow me to handle UTF-8 characters? I'm not married to ElementTree for a solution, but I would prefer not to have to create a schema. Thanks in advance for any advice/solutions!
In my opinion, the ElementTree
is a good choice. If you need a bit more capable package in future, you can switch to the third party lxml
module that uses the same interface.
The answer to your problem can be found in the doc http://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.ElementTree.write
The output is either a string (str) or binary (bytes). This is controlled by the encoding argument. If encoding is "unicode", the output is a string; otherwise, it’s binary. Note that this may conflict with the type of file if it’s an open file object; make sure you do not try to write a string to a binary stream and vice versa.
Basically, you are doing it correctly. You open()
the file in a text mode, this way the file accepts the strings and you neet to use the 'unicode'
argument for the tree.write()
. Otherwise, you could open the file in binary mode (no encoding argument in the open()
) and use the 'utf-8'
in the tree.write()
.
A bit cleaned-up code that works on its own:
#!python3
from xml.etree import ElementTree as et
def dict_to_elem(dictionary):
item = et.Element('Item')
for key in dictionary:
field = et.Element(key.replace(' ',''))
field.text = dictionary[key]
item.append(field)
return item
root = et.Element('AllItems') # create the element first...
tree = et.ElementTree(root) # and pass it to the created tree
root.append(dict_to_elem( {'some_tag':'Hello World', 'xxx': 'yyy'} ))
# Lather, rinse, repeat this append step as needed
filename = 'a.xml'
with open(filename, 'w', encoding='utf-8') as file:
tree.write(file, encoding='unicode')
# The alternative is...
fname = 'b.xml'
with open(fname, 'wb') as f:
tree.write(f, encoding='utf-8')
It depends on the purpose. Of the two, I personally prefer the first solution. It clearly says that you write a text file (and the XML is a text file).
But the simplest alternative where you do not need to tell the encoding is just to pass the file name to the tree.write
like this:
tree.write('c.xml', encoding='utf-8')
It opens the file, writes the content using the given encoding (updated after the Sebastian's comment below), and closes the file. And you can read it easily and you can do no mistake here.
It shouldn't be necessary but you could add xml declaration explicitly if your tool doesn't understand the generated xml file:
#!/usr/bin/env python3
from xml.etree import ElementTree as etree
your_dict = {'some_tag': 'Hello World ☺'}
def add_items(root, items):
for name, text in items:
elem = etree.SubElement(root, name)
elem.text = text
root = etree.Element('AllItems')
add_items(etree.SubElement(root, 'Item'),
((key.replace(' ', ''), value) for key, value in your_dict.items()))
tree = etree.ElementTree(root)
tree.write('output.xml', xml_declaration=True, encoding='utf-8')
<?xml version='1.0' encoding='utf-8'?>
<AllItems><Item><some_tag>Hello World ☺</some_tag></Item></AllItems>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With