Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

extract text between xml tags in python

Tags:

python

xml

I have xml string below and trying to print text between tags domain, receive_time , serial and seqno for each entry tag.

xml="""
<response status="success" code="19"><result><msg><line>query job enqueued with jobid 19032</line></msg><job>19032</job></result></response>
19032
<response status="success"><result>
  <job>
    <tenq>14:10:09</tenq>
    <tdeq>14:10:09</tdeq>
    <tlast>19:00:00</tlast>
    <status>ACT</status>
    <id>19032</id>
    <cached-logs>64</cached-logs>
  </job>
  <log>
    <logs count="20" progress="29">
      <entry logid="2473601">
        <domain>1</domain>
        <receive_time>2017/11/26 14:10:08</receive_time>
        <serial>007901004140</serial>
        <seqno>10156449120</seqno>
      </entry>
      <entry logid="2473601">
        <domain>1</domain>
        <receive_time>2017/11/26 14:10:08</receive_time>
        <serial>007901004140</serial>
        <seqno>10156449120</seqno>
      </entry>
      </logs>
  </log>
</result></response>
"""

using xml.etree.ElementTree. To get what's between domain tag I was trying node.attrib.get('domain') or node.get('domain')..please advise

import xml.etree.ElementTree as ET
tree = ET.fromstring(xml)
for node in tree.iter('entry'):
        print node

It can be other python library too, does not have to be xml.etree. I do not want to print text between tags blindly, I need to print tag name followed by text so i.e.:

domain: 1
receive_time: 2017/11/26 14:10:08
serial: 007901004140
seqno: 10156449120

etc
like image 318
irom Avatar asked Nov 26 '17 19:11

irom


1 Answers

You find the domain tag using the find() method first. Then, the tag attribute and the text attribute should fetch the details you are looking for -

import xml.etree.ElementTree as ET
tree = ET.fromstring(xml)
for node in tree.iter('entry'):
    print('\n')
    for elem in node.iter():
        if not elem.tag==node.tag:
            print("{}: {}".format(elem.tag, elem.text))

Hope this helps!

Output -

domain: 1
receive_time: 2017/11/26 14:10:08
serial: 007901004140
seqno: 10156449120


domain: 1
receive_time: 2017/11/26 14:10:08
serial: 007901004140
seqno: 10156449120
like image 119
Vivek Kalyanarangan Avatar answered Sep 28 '22 17:09

Vivek Kalyanarangan