Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problems to get element.tagName. Parsing an XML with Python and xml.dom.minidom

I'm parsing an XML with Python (xml.dom.minidom) and I cant get the tagName of a node.

The interpreter is returning:

AttributeError: Text instance has no attribute 'tagName' 

when I try to extract (for example) the string 'format' from the node:

<format>DVD</format>

I have found a couple of very similar posts here in Starckoverflow, but I still can't find the solution.

I'm aware that there might be alternative modules to deal with this issue, but my intention here is to understand WHY is it failing.

Thanks a lot in advance and best regards,

Here is my code:

from xml.dom.minidom import parse
import xml.dom.minidom

# Open XML document
xml = xml.dom.minidom.parse("movies.xml")

# collection Node
collection_node = xml.firstChild

# movie Nodes
movie_nodes = collection_node.childNodes

for m in movie_nodes:

    if len(m.childNodes) > 0:
        print '\nMovie:', m.getAttribute('title')

        for tag in m.childNodes:
            print tag.tagName  # AttributeError: Text instance has no attribute 'tagName'
            for text in tag.childNodes:
                print text.data

And here the XML:

<collection shelf="New Arrivals">
<movie title="Enemy Behind">
   <type>War, Thriller</type>
   <format>DVD</format>
   <year>2003</year>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Talk about a US-Japan war</description>
</movie>
<movie title="Transformers">
   <type>Anime, Science Fiction</type>
   <format>DVD</format>
   <year>1989</year>
   <rating>R</rating>
   <stars>8</stars>
   <description>A schientific fiction</description>
</movie>
</collection>

Similar posts:

Get node name with minidom

Element.tagName for python not working

like image 926
Manu Avatar asked Mar 19 '15 09:03

Manu


1 Answers

The error was due to new lines between element nodes are considered a different node which of type TEXT_NODE (see Node.nodeType), and TEXT_NODE doesn't have tagName attribute.

You can add a node type checking to avoid printing tagName from text nodes :

if tag.nodeType != tag.TEXT_NODE:
    print tag.tagName 
like image 67
har07 Avatar answered Oct 04 '22 07:10

har07