Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to shallow copy an existing file-object?

Tags:

python

The use case for this would be creating multiple generators based on some file-object without any of them trampling each other's read state.

Originally I (thought I) had a working implementation using seek() and tell() where each generator was decorated by a meta-generator which maintained the file-handle position. This worked fine on things like StringIO, but failed on real files due the to read-ahead buffer mutilating the offset.

Using readline() or otherwise mocking the real file-object isn't viable as the reason for doing this was the excessively large files prompting a generator expression in the first place. So losing the read-ahead buffer isn't really a good option (as an aside, why was Python implemented this way in the first place? Shouldn't the buffer be like a cache and not actually exposed to the user? Proper encapsulation should have prevented this tell() issue in the first place...)

I then tried to use copy.copy, but that results in something like this: <closed file '<uninitialized file>', mode '<uninitialized file>' at 0x7f722ffda810>. Which appears unusable.

Does there exist an alternative way to copy? Is there a way to initialize a file-object? Or should I give up on this use case entirely because it is not possible in Python?

like image 439
ebolyen Avatar asked Oct 14 '14 18:10

ebolyen


1 Answers

You are looking for itertools.tee.

from itertools import tee
with open("somefile.txt", "r") as fh:
    fh1, fh2, fh3 = tee(fh, 3)

Once you call tee, do not use the parent iterator again. The iterators returned from tee may be used freely and independently, however.

For file objects specifically (to keep file-specific methods like read), you can just open a file multiple times; each file object will maintain its own file pointer as it reads the file.

fh1, fh2, fh3 = [open("somefile.txt") for i in range(3)]

or, if you already have a file object fh:

fh1, fh2, fh3 = [open(fh.name) for i in range(3)]

This doesn't preserve an already advanced file pointer, but it's easy enough to jump ahead:

for x in fh1, fh2, fh3:
    x.seek(fh.tell())
like image 98
chepner Avatar answered Oct 02 '22 11:10

chepner