Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python minidom/xml : How to set node text with minidom api

Tags:

python

xml

I am currently trying to load an xml file and modify the text inside a pair of xml tags, like this:

   <anode>sometext</anode>

I currently have a helper function called getText that I use to get the text sometext above. Now I need to modify the childnodes I guess, inside the node to modify a node that has the XML snippet shown above, to change sometext to othertext. The common API patch getText function is shown below in the footnote.

So my question is, that's how we get the text, how do I write a companion helper function called setText(node,'newtext'). I'd prefer if it operated on the node level, and found its way down to the childnodes all on its own, and worked robustly.

A previous question has an accepted answer that says "I'm not sure you can modify the DOM in place". Is that really true? Is Minidom so broken that it's effectively Read Only?


By way of footnote, to read text between <anode> and </anode>, I took was surprised no direct simple single minidom function exists, and that this small helper function is suggested in the Python xml tutorials:

import xml.dom.minidom

def getText(nodelist):
    rc = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            rc.append(node.data)
    return ''.join(rc)

# I've added this bit to make usage of the above clearer
def getTextFromNode(node):
   return getText(node.childNodes)

Elsewhere in StackOverflow, I see this accepted answer from 2008:

   node[0].firstChild.nodeValue

If that's how hard it is to read with minidom, I'm not suprised to see that people say "Just don't do it!" when you ask how to write things that might modify the Node structure of your XML document.

Update The answer below shows it's not as hard as I thought.

like image 963
Warren P Avatar asked Nov 27 '12 15:11

Warren P


1 Answers

actually minidom is no more difficult to use than other dom parsers, if you dont like it you may want to consider complaining to the w3c

from xml.dom.minidom import parseString

XML = """
<nodeA>
    <nodeB>Text hello</nodeB>
    <nodeC><noText></noText></nodeC>
</nodeA>
"""


def replaceText(node, newText):
    if node.firstChild.nodeType != node.TEXT_NODE:
        raise Exception("node does not contain text")

    node.firstChild.replaceWholeText(newText)

def main():
    doc = parseString(XML)

    node = doc.getElementsByTagName('nodeB')[0]
    replaceText(node, "Hello World")

    print doc.toxml()

    try:
        node = doc.getElementsByTagName('nodeC')[0]
        replaceText(node, "Hello World")
    except:
        print "error"


if __name__ == '__main__':
    main()
like image 175
Christian Thieme Avatar answered Nov 10 '22 00:11

Christian Thieme