Accessing x+1 element with 'for x in list' in Python

Question

I'm trying to parse a new line delimited text file into blocks of lines, which are appended to a .txt file. I'd like to be able to grab x amount of lines AFTER my ending string, as these lines will vary in content, meaning setting the 'end string' to try to match it would miss lines.

Example of file:

"Start"
"..."
"..."
"..."
"..."
"---" ##End here
"xxx" ##Unique data here
"xxx" ##And here

And here's the code

first = "Start"
first_end = "---"

with open('testlog.log') as infile, open('parsed.txt', 'a') as outfile:
    copy = False
    for line in infile:
        if line.strip().startswith(first):
            copy = True
            outfile.write(line)
        elif line.strip().startswith(first_end):
            copy = False
            outfile.write(line)
            ##Want to also write next 2 lines here
        elif copy:
            outfile.write(line)

Is there any way to do this using for line in infile, or do I need to use a different type of loop?

Kevin · Accepted Answer

You can use next or readline (in Python 3 and up) to retrieve the next line in the file:

    elif line.strip().startswith(first_end):
        copy = False
        outfile.write(line)
        outfile.write(next(infile))
        outfile.write(next(infile))

or

    #note: not compatible with Python 2.7 and below
    elif line.strip().startswith(first_end):
        copy = False
        outfile.write(line)
        outfile.write(infile.readline())
        outfile.write(infile.readline())

This will also cause the file pointer to advance two additional lines, so the next iteration of for line in infile: will skip past the two lines you read with readline.

Bonus terminology nitpick: a file object is not a list, and methods for accessing the x+1th element of a list might not work for accessing the next line of a file, and vice versa. If you did want to access the next item of a proper list object, you could use enumerate so you can perform arithmetic on the list's index. For example:

seq = ["foo", "bar", "baz", "qux", "troz", "zort"]

#find all instances of "baz" and also the first two elements after "baz"
for idx, item in enumerate(seq):
    if item == "baz":
        print(item)
        print(seq[idx+1])
        print(seq[idx+2])

Note that, unlike readline, indexing will not advance the iterator, so for idx, item in enumerate(seq): will still iterate over "qux" and "troz".

An approach that works on any iterable is to use an additional variable to keep track of state across iterations. The advantage of this is that you don't have to know anything about how to manually advance iterables; the disadvantage is that reasoning about the logic within the loop is more difficult because it exposes an additional side-effect.

first = "Start"
first_end = "---"

with open('testlog.log') as infile, open('parsed.txt', 'a') as outfile:
    copy = False
    num_items_to_write = 0
    for line in infile:
        if num_items_to_write > 0:
            outfile.write(line)
            num_items_to_write -= 1
        elif line.strip().startswith(first):
            copy = True
            outfile.write(line)
        elif line.strip().startswith(first_end):
            copy = False
            outfile.write(line)
            num_items_to_write = 2
        elif copy:
            outfile.write(line)

In the specific case of pulling repetitive groups of data out of a delimited file, it might be appropriate to skip iteration entirely and use regex instead. For data like yours, that might look like:

import re

with open("testlog.log") as file:
    data = file.read()

pattern = re.compile(r"""
^Start$                 #"Start" by itself on a line
(?:
.*$)*?             #zero or more lines, matched non-greedily
                        #use (?:) for all groups so `findall` doesn't capture them later

---$                  #"---" by itself on a line
(?:
.*$){2}            #exactly two lines
""", re.MULTILINE | re.VERBOSE)

#equivalent one-line regex:
#pattern = re.compile("^Start$(?:
.*$)*?
---$(?:
.*$){2}", re.MULTILINE)

for group in pattern.findall(data):
    print("Found group:")
    print(group)
    print("End of group.

")

When run on a log that looks like:

Start
foo
bar
baz
qux
---
troz
zort
alice
bob
carol
dave
Start
Fred
Barney
---
Wilma
Betty
Pebbles

... This will produce the output:

Found group:
Start
foo
bar
baz
qux
---
troz
zort
End of group.


Found group:
Start
Fred
Barney
---
Wilma
Betty
End of group.

Accessing x+1 element with 'for x in list' in Python

Tags:

python

eddiewastaken

1 Answers

Kevin

Recent Activity

Donate For Us

Accessing x+1 element with 'for x in list' in Python

Tags:

python

eddiewastaken

1 Answers

Kevin

Related questions

Recent Activity

Donate For Us