Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Edit XML file text based on path

I have an XML file (e.g. jerry.xml) which contains some data as given below.

<data>
<country name="Peru">
    <rank updated="yes">2</rank>
    <language>english</language>
    <currency>1.21$/kg</currency> 
    <gdppc month="06">141100</gdppc>
    <gdpnp month="10">2.304e+0150</gdpnp>
    <neighbor name="Austria" direction="E"/>
    <neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
    <rank updated="yes">5</rank>
    <language>english</language>
    <currency>4.1$/kg</currency> 
    <gdppc month="05">59900</gdppc>
    <gdpnp month="08">1.9e-015</gdpnp>
    <neighbor name="Malaysia" direction="N"/>
</country>

I extracted the full paths of some selected texts from the xml above using the code below. The reasons are given in this post.

def extractNumbers(path, node):
    nums = []

    if 'month' in node.attrib:
        if node.attrib['month'] in ['05', '06']:
            return nums

    path += '/' + node.tag
    if 'name' in node.keys():
        path += '=' + node.attrib['name']

    elif 'year' in node.keys():
        path += ' ' + 'month' + '=' + node.attrib['month']
    try:
        num = float(node.text)
        nums.append( (path, num) )
    except (ValueError, TypeError):
        pass
    for e in list(node):
        nums.extend( extractNumbers(path, e) )
    return nums

tree = ET.parse('jerry.xml')
nums = extractNumbers('', tree.getroot())
print len(nums)
print nums

This gives me the location of the elements I need to change as shown in colomn 1 of the csv below (e.g. hrong.csv).

Path                                                      Text1       Text2       Text3       Text4       Text5 
'/data/country name=singapore/gdpnp month=08';            5.2e-015;   2e-05;      8e-06;      9e-04;      0.4e-05;   
'/data/country name=peru/gdppc month=06';                 0.04;       0.02;       0.15;       3.24;       0.98;                                                 

I would like to replace the text of the elements of the original XML file (jerry.xml) by those in column 2 of the hrong.csv above, based on the location of the elements in column 1.

I am a newbie to python and realize I might not be using the best approach. I would appreciate any help regards direction wrt this. I basically need to parse only some selected texts nodes of an xml file, modify the selected text nodes and save each file.

Thanks

like image 950
Mia Avatar asked Apr 01 '15 02:04

Mia


People also ask

What is path in XML?

XML Path (XPath) is a language for addressing parts of an XML document. It is a W3C recommendation. XPath is well known and commonly used in XML applications. This language will be used for specifying location path expressions which covers most areas of an XML document.

How to change root Element in XML using Python?

_setroot(element): For replacing the root of a tree we can use this _setroot object. So it will replace the current tree with the new element that we have given, and discard the existing content of that tree. getroot(): The getroot() will return the root element of the tree.


2 Answers

You should be able to use the XPath capabilities of the module to do this:

import xml.etree.ElementTree as ET
tree = ET.parse('jerry.xml')
root = tree.getroot()
for data in root.findall(".//country[@name='singapore']/gdpnp[@month='08']"):
    data.text = csv_value

tree.write("filename.xml")

So you need to rewrite the path in the csv to match the XPath rules defined for the module (see Supported XPath rules).

like image 73
rfkortekaas Avatar answered Oct 13 '22 06:10

rfkortekaas


FIrst of all, documentation of how to modify an XML. Now, here is my own example:

import xml.etree.ElementTree as ET

s = """
<root>
    <parent attribute="value">
        <child_1 other_attr="other_value">child text</child_1>
        <child_2 yet_another_attr="another_value">more child text</child_2>
    </parent>
</root>
"""

root = ET.fromstring(s)

for parent in root.getchildren():
    parent.attrib['attribute'] = 'new value'
    for child in parent.getchildren():
        child.attrib['new_attrib'] = 'new attribute for {}'.format(child.tag)
        child.text += ', appended text!'

>>> ET.dump(root)
<root>
    <parent attribute="new value">
        <child_1 new_attrib="new attribute for child_1" other_attr="other_value">child text, appended text!</child_1>
        <child_2 new_attrib="new attribute for child_2" yet_another_attr="another_value">more child text, appended text!</child_2>
    </parent>
</root>

And you can do this with Xpath as well.

>>> root.find('parent/child_1[@other_attr]').attrib['other_attr'] = 'found it!'
>>> ET.dump(root)
<root>
    <parent attribute="new value">
        <child_1 new_attrib="new attribute for child_1" other_attr="found it!">child text, appended text!</child_1>
        <child_2 new_attrib="new attribute for child_2" yet_another_attr="another_value">more child text, appended text!</child_2>
    </parent>
</root>
like image 30
Inbar Rose Avatar answered Oct 13 '22 05:10

Inbar Rose