I have done a little research around the matter, but haven't really been able to come up with anything useful. What I need is to not just parse and read, but actually manipulate XML documents in python, similar to the way JavaScript is able to manipulate HTML documents. Allow me to give an example. say I have the following XML document: <pre class="prettyprint"><code><library> <book id=123> <title>Intro to XML</title> <author>John Smith</author> <year>1996</year> </book> <book id=456> <title>XML 101</title> <author>Bill Jones</author> <year>2000</year> </book> <book id=789> <title>This Book is Unrelated to XML</title> <author>Justin Tyme</author> <year>2006</year> </book> </library> </code></pre> I need a way both to retrieve an element, either using XPath or with a "pythonic" method, as outlined here, but I also need to be able to manipulate the document, such as below: <pre class="prettyprint"><code>>>>xml.getElement('id=123').title="Intro to XML v2" >>>xml.getElement('id=123').year="1998" </code></pre> If anyone is aware of such a tool in Python, please let me know. Thanks!

If you want to avoid installing <code>lxml.etree</code>, you can use <code>xml.etree</code> from the standard library. Here is Acorn's answer ported to <code>xml.etree</code>: <pre class="prettyprint"><code>import xml.etree.ElementTree as et # was: import lxml.etree as et xmltext = """ <root> <fruit>apple</fruit> <fruit>pear</fruit> <fruit>mango</fruit> <fruit>kiwi</fruit> </root> """ tree = et.fromstring(xmltext) for fruit in tree.findall('fruit'): # was: tree.xpath('//fruit') fruit.text = 'rotten %s' % (fruit.text,) print et.tostring(tree) # removed argument: prettyprint </code></pre> note: I would have put this as a comment on Acorn's answer if I could have done so in a clear manner. If you like this answer, give the upvote to Acorn.

<code>lxml</code> allows you to select elements using XPath, and also manipulate those elements. <pre class="prettyprint lang-py prettyprint-override"><code>import lxml.etree as et xmltext = """ <root> <fruit>apple</fruit> <fruit>pear</fruit> <fruit>mango</fruit> <fruit>kiwi</fruit> </root> """ tree = et.fromstring(xmltext) for fruit in tree.xpath('//fruit'): fruit.text = 'rotten %s' % (fruit.text,) print et.tostring(tree, pretty_print=True) </code></pre> Result: <pre class="prettyprint"><code><root> <fruit>rotten apple</fruit> <fruit>rotten pear</fruit> <fruit>rotten mango</fruit> <fruit>rotten kiwi</fruit> </root> </code></pre>

Is there an easy way to manipulate XML documents in Python?

Tags:

python

xml

I have done a little research around the matter, but haven't really been able to come up with anything useful. What I need is to not just parse and read, but actually manipulate XML documents in python, similar to the way JavaScript is able to manipulate HTML documents.

Allow me to give an example. say I have the following XML document:

<library>
    <book id=123>
        <title>Intro to XML</title>
        <author>John Smith</author>
        <year>1996</year>
    </book>
    <book id=456>
        <title>XML 101</title>
        <author>Bill Jones</author>
        <year>2000</year>
    </book>
    <book id=789>
        <title>This Book is Unrelated to XML</title>
        <author>Justin Tyme</author>
        <year>2006</year>
    </book>
</library>

I need a way both to retrieve an element, either using XPath or with a "pythonic" method, as outlined here, but I also need to be able to manipulate the document, such as below:

>>>xml.getElement('id=123').title="Intro to XML v2"
>>>xml.getElement('id=123').year="1998"

If anyone is aware of such a tool in Python, please let me know. Thanks!

325

asked Nov 01 '11 13:11

ewok

2 Answers

If you want to avoid installing lxml.etree, you can use xml.etree from the standard library.

Here is Acorn's answer ported to xml.etree:

import xml.etree.ElementTree as et  # was: import lxml.etree as et

xmltext = """
<root>
    <fruit>apple</fruit>
    <fruit>pear</fruit>
    <fruit>mango</fruit>
    <fruit>kiwi</fruit>
</root>
"""

tree = et.fromstring(xmltext)

for fruit in tree.findall('fruit'): # was: tree.xpath('//fruit')
    fruit.text = 'rotten %s' % (fruit.text,)

print et.tostring(tree) # removed argument: prettyprint

note: I would have put this as a comment on Acorn's answer if I could have done so in a clear manner. If you like this answer, give the upvote to Acorn.

156

answered Oct 13 '22 04:10

Steven Rumbalski

lxml allows you to select elements using XPath, and also manipulate those elements.

import lxml.etree as et

xmltext = """
<root>
    <fruit>apple</fruit>
    <fruit>pear</fruit>
    <fruit>mango</fruit>
    <fruit>kiwi</fruit>
</root>
"""

tree = et.fromstring(xmltext)

for fruit in tree.xpath('//fruit'):
    fruit.text = 'rotten %s' % (fruit.text,)

print et.tostring(tree, pretty_print=True)

Result:

<root>
    <fruit>rotten apple</fruit>
    <fruit>rotten pear</fruit>
    <fruit>rotten mango</fruit>
    <fruit>rotten kiwi</fruit>
</root>

answered Oct 13 '22 03:10

Acorn

Related questions
                            
                                Fastest way to convert a list of indices to 2D numpy array of ones
                            
                                ModuleNotFoundError: No module named 'pip' python3
                            
                                How to distinguish between a function and a class method?
                            
                                Mimic Python's strip() function in C
                            
                                Python, subprocess, devenv, why no output?
                            
                                Pythonic way to only do work first time a variable is called
                            
                                Python, lambda, find minimum
                            
                                Writing a Faster Python Spider
                            
                                Best way to overwrite some bits in a particular range
                            
                                IRC client in python
                            
                                How to get the height of Windows Taskbar using Python/PyQT/Win32
                            
                                Opening and using Safari
                            
                                parse string of integer sets with intervals to list
                            
                                Print Javascript Exceptions In A QWebView To The Console
                            
                                Python Dictionary with List as Keys and Tuple as Values
                            
                                PyQt: new API with Python 2
                            
                                How to prevent decompilation or inspecting python code?
                            
                                Very basic Python question (strings, formats and escapes)
                            
                                How to convert an associative array in python?
                            
                                How would I succinctly transpose nested lists?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With