Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an easy way to manipulate XML documents in Python?

Tags:

python

xml

I have done a little research around the matter, but haven't really been able to come up with anything useful. What I need is to not just parse and read, but actually manipulate XML documents in python, similar to the way JavaScript is able to manipulate HTML documents.

Allow me to give an example. say I have the following XML document:

<library>
    <book id=123>
        <title>Intro to XML</title>
        <author>John Smith</author>
        <year>1996</year>
    </book>
    <book id=456>
        <title>XML 101</title>
        <author>Bill Jones</author>
        <year>2000</year>
    </book>
    <book id=789>
        <title>This Book is Unrelated to XML</title>
        <author>Justin Tyme</author>
        <year>2006</year>
    </book>
</library>

I need a way both to retrieve an element, either using XPath or with a "pythonic" method, as outlined here, but I also need to be able to manipulate the document, such as below:

>>>xml.getElement('id=123').title="Intro to XML v2"
>>>xml.getElement('id=123').year="1998"

If anyone is aware of such a tool in Python, please let me know. Thanks!

like image 325
ewok Avatar asked Nov 01 '11 13:11

ewok


People also ask

How can I modify XML file?

XML files can also be edited using your computer's notepad program and even with certain word processing and spreadsheet programs. However, XML editors are considered advantageous because they are able to validate your code and ensure you remain within a valid XML structure.

How do you access XML elements in Python?

To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().


2 Answers

If you want to avoid installing lxml.etree, you can use xml.etree from the standard library.

Here is Acorn's answer ported to xml.etree:

import xml.etree.ElementTree as et  # was: import lxml.etree as et

xmltext = """
<root>
    <fruit>apple</fruit>
    <fruit>pear</fruit>
    <fruit>mango</fruit>
    <fruit>kiwi</fruit>
</root>
"""

tree = et.fromstring(xmltext)

for fruit in tree.findall('fruit'): # was: tree.xpath('//fruit')
    fruit.text = 'rotten %s' % (fruit.text,)

print et.tostring(tree) # removed argument: prettyprint

note: I would have put this as a comment on Acorn's answer if I could have done so in a clear manner. If you like this answer, give the upvote to Acorn.

like image 156
Steven Rumbalski Avatar answered Oct 13 '22 04:10

Steven Rumbalski


lxml allows you to select elements using XPath, and also manipulate those elements.

import lxml.etree as et

xmltext = """
<root>
    <fruit>apple</fruit>
    <fruit>pear</fruit>
    <fruit>mango</fruit>
    <fruit>kiwi</fruit>
</root>
"""

tree = et.fromstring(xmltext)

for fruit in tree.xpath('//fruit'):
    fruit.text = 'rotten %s' % (fruit.text,)

print et.tostring(tree, pretty_print=True)

Result:

<root>
    <fruit>rotten apple</fruit>
    <fruit>rotten pear</fruit>
    <fruit>rotten mango</fruit>
    <fruit>rotten kiwi</fruit>
</root>
like image 45
Acorn Avatar answered Oct 13 '22 03:10

Acorn