Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python nested loop - get next N lines

I'm new to Python and trying to do a nested loop. I have a very large file (1.1 million rows), and I'd like to use it to create a file that has each line along with the next N lines, for example with the next 3 lines:

1    2
1    3
1    4
2    3
2    4
2    5

Right now I'm just trying to get the loops working with rownumbers instead of the strings since it's easier to visualize. I came up with this code, but it's not behaving how I want it to:

with open('C:/working_file.txt', mode='r', encoding = 'utf8') as f: 
for i, line in enumerate(f):
     line_a = i
     lower_bound = i + 1
     upper_bound = i + 4
     with open('C:/working_file.txt', mode='r', encoding = 'utf8') as g:
        for j, line in enumerate(g):
            while j >= lower_bound and j <= upper_bound:
                line_b = j
                j = j+1
                print(line_a, line_b)

Instead of the output I want like above, it's giving me this:

990     991
990     992
990     993
990     994
990     992
990     993
990     994
990     993
990     994
990     994

As you can see the inner loop is iterating multiple times for each line in the outer loop. It seems like there should only be one iteration per line in the outer loop. What am I missing?

EDIT: My question was answered below, here is the exact code I ended up using:

from collections import deque
from itertools import cycle
log = open('C:/example.txt', mode='w', encoding = 'utf8') 
try:
    xrange 
except NameError: # python3
    xrange = range

def pack(d):
    tup = tuple(d)
    return zip(cycle(tup[0:1]), tup[1:])

def window(seq, n=2):
    it = iter(seq)
    d = deque((next(it, None) for _ in range(n)), maxlen=n)
    yield pack(d)
    for e in it:
        d.append(e)
        yield pack(d)

for l in window(open('c:/working_file.txt', mode='r', encoding='utf8'),100):
    for a, b in l:
        print(a.strip() + '\t' + b.strip(), file=log)
like image 563
raspberry_door Avatar asked Dec 10 '13 00:12

raspberry_door


People also ask

How do you go to the next line in a for loop in Python?

Python file method next() is used when a file is used as an iterator, typically in a loop, the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit.

How many nested for loops is too many?

Microsoft BASIC had a nesting limit of 8. @Davislor: The error message refers to the stack size in the compiler, which is also a program, and which is recursively processing the nested-looped construct.

What is nested FOR NEXT loop?

When a loop is nested inside another loop, the inner loop runs many times inside the outer loop. In each iteration of the outer loop, the inner loop will be re-started. The inner loop must finish all of its iterations before the outer loop can continue to its next iteration.


2 Answers

Based on window example from old docs you can use something like:

from collections import deque
from itertools import cycle

try:
    xrange 
except NameError: # python3
    xrange = range

def pack(d):
    tup = tuple(d)
    return zip(cycle(tup[0:1]), tup[1:])

def window(seq, n=2):
    it = iter(seq)
    d = deque((next(it, None) for _ in xrange(n)), maxlen=n)
    yield pack(d)
    for e in it:
        d.append(e)
        yield pack(d)

Demo:

>>> for l in window([1,2,3,4,5], 4):
...     for l1, l2 in l:
...         print l1, l2
...
1 2
1 3
1 4
2 3
2 4
2 5

So, basically you can pass your file to window to get desired result:

window(open('C:/working_file.txt', mode='r', encoding='utf8'), 4)
like image 112
alko Avatar answered Oct 06 '22 01:10

alko


You can do this with slices. This is easiest if you read the whole file into a list first:

with open('C:/working_file.txt', mode='r', encoding = 'utf8') as f: 
    data = f.readlines()

for i, line_a in enumerate(data):
    for j, line_b in enumerate(data[i+1:i+5], start=i+1):
        print(i, j)

When you change it to printing the lines instead of the line numbers, you can drop the second enumerate and just do for line_b in data[i+1:i+5]. Note that the slice includes the item at the start index, but not the item at the end index, so that needs to be one higher than your current upper bound.

like image 33
lvc Avatar answered Oct 06 '22 00:10

lvc