Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - How to nest file read loops?

2 days ago I was first introduced to Python (and programming in general). Today I'm stuck. I've spent hours trying to find an answer to what I suspect is a problem so trivial, nobody else has yet been stuck here : )

The boss wants me to manually clean up HUGE .xml files into something more human readable. I'm trying to create a script to do it for me. The following is an example of the .xml file as well as my desired output.

Input (File.xml):

<IssueTracking>
  <Issue>
    <SequenceNum>123</SequenceNum>
    <Subject>Subject of Ticket 123</Subject>
    <Description>Line 1 in Description field of Ticket 123.
Line 2 in Description field of Ticket 123.
Line 3 in Description field of Ticket 123.</Description>
  </Issue>
  <Issue>
    <SequenceNum>124</SequenceNum>
    <Subject>Subject of Ticket 124</Subject>
    <Description>Line 1 in Description field of Ticket 124.
Line 2 in Description field of Ticket 124.
Line 3 in Description field of Ticket 124.</Description>
  </Issue>
</IssueTracking>

Desired Output:

123    Subject of Ticket 123
Line 1 in Description field of Ticket 123.
Line 2 in Description field of Ticket 123.
Line 3 in Description field of Ticket 123.

124    Subject of Ticket 124
Line 1 in Description field of Ticket 124.
Line 2 in Description field of Ticket 124.
Line 3 in Description field of Ticket 124.

Here is what I've got so far.

with open(File.xml, 'r') as SourceFile: # Opens the file
    while 1: # Keep going through the file to the end
        SourceFileLine = SourceFile.readline() # Saves lines of the source file
        if not SourceFileLine: # Skip empty lines
            break

        SourceFileLine = SourceFileLine.strip() # Strips the whitespace

        if "<SequenceNum>" in SourceFileLine:
            SequenceNum = SourceFileLine[13:-14]  # Trims the tags, saves the field.
            continue

        if "<Subject>" in SourceFileLine:
            Subject = SourceFileLine[9:-10]
            continue

        #if "<Description>" in SourceFileLine:
        #    last_pos = SourceFile.tell() 
        #    while "</Description>" not in SourceFileLine:
        #        SourceFile.seek(last_pos)
        #        ?????
        #    
        #    Description = Description[22:]
        #    continue

        if "</Issue>" in SourceFileLine:
            print(SequenceNum, end = "\t")
            print(Subject)
        #    print(Description)
            print("\n")

I'm stuck in identifying and retaining those three lines between the <Description> tags into a single string I can print before continuing down the source file. Now having scanned dozens of other examples of file line read loops, I suspect what I need is to flag the point I reach the destination field and nest another read loop at that point in the file. But I have not found another example of this being done, so I assume I'm missing something basic or there is a better way. Thanks in advance for help!

like image 225
phlogiston Avatar asked Jul 20 '12 19:07

phlogiston


1 Answers

An example of using lxml which I highly recommend to process your data. (nb: written for Py2.x but easy to adapt for Py3.x)

from lxml import etree
xml = """<IssueTracking>
  <Issue>
    <SequenceNum>123</SequenceNum>
    <Subject>Subject of Ticket 123</Subject>
    <Description>Line 1 in Description field of Ticket 123.
Line 2 in Description field of Ticket 123.
Line 3 in Description field of Ticket 123.</Description>
  </Issue>
  <Issue>
    <SequenceNum>124</SequenceNum>
    <Subject>Subject of Ticket 124</Subject>
    <Description>Line 1 in Description field of Ticket 124.
Line 2 in Description field of Ticket 124.
Line 3 in Description field of Ticket 124.</Description>
  </Issue>
</IssueTracking>
"""

root = etree.fromstring(xml)
for issue in root.findall('Issue'):
    as_list = [issue.find(n).text for n in ('SequenceNum', 'Subject', 'Description')]
    as_list[2] = as_list[2].split('\n')
    print as_list

Prints:

['123', 'Subject of Ticket 123', ['Line 1 in Description field of Ticket 123.', 'Line 2 in Description field of Ticket 123.', 'Line 3 in Description field of Ticket 123.']]
['124', 'Subject of Ticket 124', ['Line 1 in Description field of Ticket 124.', 'Line 2 in Description field of Ticket 124.', 'Line 3 in Description field of Ticket 124.']]
like image 75
Jon Clements Avatar answered Sep 28 '22 02:09

Jon Clements