Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I search for a Tag in xml file using ElementTree where i have a certain "Parent"tag with a specific value? (python)

I just started learning Python and have to write a program, that parses xml files. I have to find a certain Tag called OrganisationReference in 2 different files and return it. In fact there are multiple Tags with this name, but only one, the one I am trying to return, that has the Tag OrganisationType with the value DEALER as a parent Tag (not quite sure whether the term is right). I tried to use ElementTree for this. Here is the code:

    import xml.etree.ElementTree as ET

    tree1 = ET.parse('Master1.xml')
    root1 = tree1.getroot()

    tree2 = ET.parse('Master2.xml')
    root2 = tree2.getroot()

    for OrganisationReference in root1.findall("./Organisation/OrganisationId/[@OrganisationType='DEALER']/OrganisationReference"):
        print(OrganisationReference.attrib)

    for OrganisationReference in root2.findall("./Organisation/OrganisationId/[@OrganisationType='DEALER']/OrganisationReference"):
        print(OrganisationReference.attrib)

But this returns nothing (also no error). Can somebody help me?

My file looks like this:

  <MessageOrganisationCount>a</MessageOrganisationCount>
  <MessageVehicleCount>x</MessageVehicleCount>
  <MessageCreditLineCount>y</MessageCreditLineCount>
  <MessagePlanCount>z</MessagePlanCount>
  <OrganisationData>
      <Organisation>
          <OrganisationId>
              <OrganisationType>DEALER</OrganisationType>
              <OrganisationReference>WHATINEED</OrganisationReference>
          </OrganisationId>
          <OrganisationName>XYZ.</OrganisationName>
 ....

Due to the fact that OrganisationReference appears a few more times in this file with different text between start and endtag, I want to get exactly the one, that you see in line 9: it has OrganisationId as a parent tag, and DEALER is also a child tag of OrganisationId.

like image 283
Jani Avatar asked Jan 25 '19 08:01

Jani


1 Answers

You were super close with your original attempt. You just need to make a couple of changes to your xpath and a tiny change to your python.

The first part of your xpath starts with ./Organization. Since you're doing the xpath from root, it expects Organization to be a child. It's not; it's a descendant.

Try changing ./Organization to .//Organization. (// is short for /descendant-or-self::node()/. See here for more info.)

The second issue is with OrganisationId/[@OrganisationType='DEALER']. That's invalid xpath. The / should be removed from between OrganisationId and the predicate.

Also, @ is abbreviated syntax for the attribute:: axis and OrganisationType is an element, not an attribute.

Try changing OrganisationId/[@OrganisationType='DEALER'] to OrganisationId[OrganisationType='DEALER'].

The python issue is with print(OrganisationReference.attrib). The OrganisationReference doesn't have any attributes; just text.

Try changing print(OrganisationReference.attrib) to print(OrganisationReference.text).

Here's an example using just one XML file for demo purposes...

XML Input (Master1.xml; with doc element added to make it well-formed)

<doc>
    <MessageOrganisationCount>a</MessageOrganisationCount>
    <MessageVehicleCount>x</MessageVehicleCount>
    <MessageCreditLineCount>y</MessageCreditLineCount>
    <MessagePlanCount>z</MessagePlanCount>
    <OrganisationData>
        <Organisation>
            <OrganisationId>
                <OrganisationType>DEALER</OrganisationType>
                <OrganisationReference>WHATINEED</OrganisationReference>
            </OrganisationId>
            <OrganisationName>XYZ.</OrganisationName>
        </Organisation>
    </OrganisationData>
</doc>

Python

import xml.etree.ElementTree as ET

tree1 = ET.parse('Master1.xml')
root1 = tree1.getroot()

for OrganisationReference in root1.findall(".//Organisation/OrganisationId[OrganisationType='DEALER']/OrganisationReference"):
    print(OrganisationReference.text)

Printed Output

WHATINEED

Also note that it doesn't appear that you need to use getroot() at all. You can use findall() directly on the tree...

import xml.etree.ElementTree as ET

tree1 = ET.parse('Master1.xml')

for OrganisationReference in tree1.findall(".//Organisation/OrganisationId[OrganisationType='DEALER']/OrganisationReference"):
    print(OrganisationReference.text)
like image 87
Daniel Haley Avatar answered Oct 10 '22 21:10

Daniel Haley