Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How do you get an XML element's text content using xml.dom.minidom?

I've called elems = xmldoc.getElementsByTagName('myTagName') on an XML object that I parsed as minidom.parse(xmlObj). Now I'm trying to get the text content of this element, and although I spent a while looking through the dir() and trying things out, I haven't found the call yet. As an example of what I want to accomplish, in:

<myTagName> Hello there </myTagName>

I would like the extract just "Hello there". (obviously I could parse this myself but I expect there is some built-in functionality)

Thanks

like image 996
mindthief Avatar asked Dec 19 '10 21:12

mindthief


People also ask

How do you access XML elements in Python?

To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().

What is XML DOM Minidom?

xml. dom. minidom is a minimal implementation of the Document Object Model interface, with an API similar to that in other languages. It is intended to be simpler than the full DOM and also significantly smaller.

How do you parse an XML string in Python?

There are two ways to parse the file using 'ElementTree' module. The first is by using the parse() function and the second is fromstring() function. The parse () function parses XML document which is supplied as a file whereas, fromstring parses XML when supplied as a string i.e within triple quotes.


1 Answers

wait a mo... do you want ALL the text under a given node? It has then to involve a subtree traversal function of some kind. Doesn't have to be recursive but this works fine:

    def get_all_text( node ):
        if node.nodeType ==  node.TEXT_NODE:
            return node.data
        else:
            text_string = ""
            for child_node in node.childNodes:
                text_string += get_all_text( child_node )
            return text_string
like image 136
mike rodent Avatar answered Oct 05 '22 06:10

mike rodent