Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge xml files with nested elements without external libraries

I am trying to merge multiple XML files together using Python and no external libraries. The XML files have nested elements.

Sample File 1:

<root>
  <element1>textA</element1>
  <elements>
    <nested1>text now</nested1>
  </elements>
</root>

Sample File 2:

<root>
  <element2>textB</element2>
  <elements>
    <nested1>text after</nested1>
    <nested2>new text</nested2>
  </elements>
</root>

What I Want:

<root>
  <element1>textA</element1>    
  <element2>textB</element2>  
  <elements>
    <nested1>text after</nested1>
    <nested2>new text</nested2>
  </elements>  
</root>  

What I have tried:

From this answer.

from xml.etree import ElementTree as et
def combine_xml(files):
    first = None
    for filename in files:
        data = et.parse(filename).getroot()
        if first is None:
            first = data
        else:
            first.extend(data)
    if first is not None:
        return et.tostring(first)

What I Get:

<root>
  <element1>textA</element1>
  <elements>
    <nested1>text now</nested1>
  </elements>
  <element2>textB</element2>
  <elements>
    <nested1>text after</nested1>
    <nested2>new text</nested2>
  </elements>
</root>

I hope you can see and understand my problem. I am looking for a proper solution, any guidance would be wonderful.

To clarify the problem, using the current solution that I have, nested elements are not merged.

like image 894
Inbar Rose Avatar asked Feb 14 '13 15:02

Inbar Rose


People also ask

Can we merge two XML files?

To use this, create a new XSLT file (File > New > XSLT Stylesheet and place in it the stylesheet above. Save the file as "merge. xsl". You should also add the files (or folder) to an Oxygen project (Project view) and create a scenario of the "XML transformation with XSLT" type for one XML file.

How do I merge two XML files in Python?

Code Explanation First, we have imported a required module, And to merge two XML files in python, we have imported ElementTree Module. The ElementTree. getroot() method returns a root element of each document. Finally, to add the element of one tree to the other, we will make use of the element.


1 Answers

What the code you posted is doing is combining all the elements regardless of whether or not an element with the same tag already exists. So you need to iterate over the elements and manually check and combine them the way you see fit, because it is not a standard way of handling XML files. I can't explain it better than code, so here it is, more or less commented:

from xml.etree import ElementTree as et

class XMLCombiner(object):
    def __init__(self, filenames):
        assert len(filenames) > 0, 'No filenames!'
        # save all the roots, in order, to be processed later
        self.roots = [et.parse(f).getroot() for f in filenames]

    def combine(self):
        for r in self.roots[1:]:
            # combine each element with the first one, and update that
            self.combine_element(self.roots[0], r)
        # return the string representation
        return et.tostring(self.roots[0])

    def combine_element(self, one, other):
        """
        This function recursively updates either the text or the children
        of an element if another element is found in `one`, or adds it
        from `other` if not found.
        """
        # Create a mapping from tag name to element, as that's what we are fltering with
        mapping = {el.tag: el for el in one}
        for el in other:
            if len(el) == 0:
                # Not nested
                try:
                    # Update the text
                    mapping[el.tag].text = el.text
                except KeyError:
                    # An element with this name is not in the mapping
                    mapping[el.tag] = el
                    # Add it
                    one.append(el)
            else:
                try:
                    # Recursively process the element, and update it in the same way
                    self.combine_element(mapping[el.tag], el)
                except KeyError:
                    # Not in the mapping
                    mapping[el.tag] = el
                    # Just add it
                    one.append(el)

if __name__ == '__main__':
    r = XMLCombiner(('sample1.xml', 'sample2.xml')).combine()
    print '-'*20
    print r
like image 155
jadkik94 Avatar answered Nov 16 '22 03:11

jadkik94