Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Order of elements from minidom getElementsByTagName

Is the order for returned elements from Mindom getElementsByTagName the same as it is in document for elements in the same hierarchy / level?

    images = svg_doc.getElementsByTagName('image') 
    image_siblings = []
    for img in images:
        if img.parentNode.getAttribute('layertype') == 'transfer':
            if img.nextSibling is not None:
                if img.nextSibling.nodeName == 'image':
                    image_siblings.append(img.nextSibling)
                elif img.nextSibling.nextSibling is not None and img.nextSibling.nextSibling.nodeName == 'image':
                    image_siblings.append(img.nextSibling.nextSibling)

I need to know if image_siblings will contain the images in the same order, they are placed in the document for the same hierarchy.

I found a similar question for JavaScript, but I'm unsure if this is also true for Python (version 3.5.2) Minidom getElementsByTagName.

like image 936
BuZZ-dEE Avatar asked Oct 10 '16 11:10

BuZZ-dEE


People also ask

What is Minidom?

minidom is a minimal implementation of the Document Object Model interface, with an API similar to that in other languages. It is intended to be simpler than the full DOM and also significantly smaller. Users who are not already proficient with the DOM should consider using the xml.

What is Toprettyxml?

toprettyxml. n.toprettyxml(indent='\t',newl='\n') Returns a string, plain or Unicode, with the XML source for the subtree rooted at n, using indent to indent nested tags and newl to end lines. toxml. n.toxml( )

Is there a DOM in Python?

The DOM is a standard tree representation for XML data. The Document Object Model is being defined by the W3C in stages, or “levels” in their terminology. The Python mapping of the API is substantially based on the DOM Level 2 recommendation. DOM applications typically start by parsing some XML into a DOM.


1 Answers

According to the code (in Python 2.7), the getElementsByName method relays on the _get_elements_by_tagName_helper function, which code is:

def _get_elements_by_tagName_helper(parent, name, rc):
    for node in parent.childNodes:
        if node.nodeType == Node.ELEMENT_NODE and \
            (name == "*" or node.tagName == name):
            rc.append(node)
        _get_elements_by_tagName_helper(node, name, rc)
    return rc

What this means is that the order in the getElementByName is the same that you have in the childNodes.

But this is true only if the tagName appears only in the same level. Notice the recursive call of _get_elements_by_tagName_helper inside the same function, which means that elements with the same tagName that are placed deeper in the tree will be interleaved with the ones you have in a higher level.

If by document you mean an XML text file or a string, the question is then moved to whether or not the parser respects the order when creating the elements in the DOM. If you use the parse function from the xml.dom.minidom, it relays on the pyexpat library, that in turns use the expat C library.

So, the short answer would be:

If you have the tagName only present in the same level of hierarchy in the XML DOM, then the order is respected. If you have the same tagName in other nodes deeper in the tree, those elements will be interleaved with the ones of higher level. The respected order is the order of the elements in the minidom document object, which order depends on the parser.

Look this example:

>>> import StringIO
>>> from xml.dom.minidom import parseString
>>> s = '''<head>
...   <tagName myatt="1"/>
...   <tagName myatt="2"/>
...   <tagName myatt="3"/>
...   <otherTag>
...     <otherDeeperTag>
...       <tagName myatt="3.1"/>
...       <tagName myatt="3.2"/>
...       <tagName myatt="3.3"/>
...     </otherDeeperTag>
...   </otherTag> 
...   <tagName myatt="4"/>
...   <tagName myatt="5"/>
... </head>'''
>>> doc = parseString(s)
>>> for e in doc.getElementsByTagName('tagName'):
...     print e.getAttribute('myatt')
... 
1
2
3
3.1
3.2
3.3
4
5

It seems the parser respects the ordering structure of the xml string (most parsers respect that order because it is easier to respect it) but I couldn't find any documentation that confirms it. I mean, it could be the (strange) case that the parser, depending on the size of the document, moves from using a list to a hash table to store the elements, and that could break the order. Take into account that the XML standard does not specify order of the elements, so a parser that does not respect order would be complaint too.

like image 91
eguaio Avatar answered Sep 28 '22 11:09

eguaio