Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to go back when reading a file using seek and calls to next()?

I'm writing a Python script to read a file, and when I arrive at a section of the file, the final way to read those lines in the section depends on information that's given also in that section. So I found here that I could use something like

fp = open('myfile')
last_pos = fp.tell()
line = fp.readline()
while line != '':
  if line == 'SPECIAL':
  fp.seek(last_pos)
  other_function(fp)
  break
last_pos = fp.tell()
line = fp.readline()

Yet, the structure of my current code is something like the following:

fh = open(filename)

# get generator function and attach None at the end to stop iteration
items = itertools.chain(((lino,line) for lino, line in enumerate(fh, start=1)), (None,))
item = True

  lino, line = next(items)

  # handle special section
  if line.startswith['SPECIAL']:

    start = fh.tell()

    for i in range(specialLines):
      lino, eline = next(items)
      # etc. get the special data I need here

    # try to set the pointer to start to reread the special section  
    fh.seek(start)

    # then reread the special section

But this approach gives the following error:

telling position disabled by next() call

Is there a way to prevent this?

like image 549
aaragon Avatar asked Mar 27 '14 13:03

aaragon


People also ask

How do I go back to the beginning of a file in Python?

Seek the Beginning of the File We can move the file pointer to the beginning of the file using the seek() method by passing the setting whence to 0. The 0 indicates the first byte, which is the beginning of the file.

What is the use of SEEK () method in files?

The seek() method sets the current file position in a file stream. The seek() method also returns the new postion.

What is offset in SEEK ()?

offset − This is the position of the read/write pointer within the file. whence − This is optional and defaults to 0 which means absolute file positioning, other values are 1 which means seek relative to the current position and 2 means seek relative to the file's end.

What does the Readlines ()> method returns?

The readlines() method returns a list containing each line in the file as a list item.


1 Answers

Using the file as an iterator (such as calling next() on it or using it in a for loop) uses an internal buffer; the actual file read position is further along the file and using .tell() will not give you the position of the next line to yield.

If you need to seek back and forth, the solution is not to use next() directly on the file object but use file.readline() only. You can still use an iterator for that, use the two-argument version of iter():

fileobj = open(filename)
fh = iter(fileobj.readline, '')

Calling next() on fileiterator() will invoke fileobj.readline() until that function returns an empty string. In effect, this creates a file iterator that doesn't use the internal buffer.

Demo:

>>> fh = open('example.txt')
>>> fhiter = iter(fh.readline, '')
>>> next(fhiter)
'foo spam eggs\n'
>>> fh.tell()
14
>>> fh.seek(0)
0
>>> next(fhiter)
'foo spam eggs\n'

Note that your enumerate chain can be simplified to:

items = itertools.chain(enumerate(fh, start=1), (None,))

although I am in the dark why you think a (None,) sentinel is needed here; StopIteration will still be raised, albeit one more next() call later.

To read specialLines count lines, use itertools.islice():

for lino, eline in islice(items, specialLines):
    # etc. get the special data I need here

You can just loop directly over fh instead of using an infinite loop and next() calls here too:

with open(filename) as fh:
    enumerated = enumerate(iter(fileobj.readline, ''), start=1):
    for lino, line in enumerated:
        # handle special section
        if line.startswith['SPECIAL']:
            start = fh.tell()

            for lino, eline in islice(items, specialLines):
                # etc. get the special data I need here

            fh.seek(start)

but do note that your line numbers will still increment even when you seek back!

You probably want to refactor your code to not need to re-read sections of your file, however.

like image 50
Martijn Pieters Avatar answered Sep 20 '22 17:09

Martijn Pieters