Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use io primitives (seek, read) on file stream that may be in universal mode?

Tags:

python

io

file-io

I have a file object which may or may not be open in universal mode. (I can access this mode with file.mode, if that helps).

I want to deal with this file using the standard io methods: read and seek.

If I open the file in non-universal mode, everything works nicely:

In [1]: f = open('example', 'r')

In [2]: f.read()
Out[2]: 'Line1\r\nLine2\r\n' # uhoh, this file has carriage returns

In [3]: f.seek(0)

In [4]: f.read(8)
Out[4]: 'Line1\r\nL'

In [5]: f.seek(-8, 1)

In [6]: f.read(8)
Out[6]: 'Line1\r\nL' # as expected, this is the same as before

In [7]: f.close()

However, if I open the file in universal mode, we have a problem:

In [8]: f = open('example', 'rU')

In [9]: f.read()
Out[9]: 'Line1\nLine2\n' # no carriage returns - thanks, 'U'!

In [10]: f.seek(0)

In [11]: f.read(8)
Out[11]: 'Line1\nLi'

In [12]: f.seek(-8, 1)

In [13]: f.read(8)
Out[13]: 'ine1\nLin' # NOT the same output, as what we read as '\n' was *2* bytes

Python interprets the \r\n as a \n, and returns a string of length 8.

However, creating this string involved reading 9 bytes from the file.

As a result, when trying to reverse the read using seek, we don't get back to where we started!


Is there a way to identify that we consumed a 2-byte newline or, better yet, disable this behaviour?

The best I can come up with at the moment is to do a tell before and after the read, and check how much we actually got, but that seems incredibly inelegant.


As an aside, it seems to me that this behaviour is actually contrary to the documentation of read:

In [54]: f.read?
Type:       builtin_function_or_method
String Form:<built-in method read of file object at 0x1a35f60>
Docstring:
read([size]) -> read at most size bytes, returned as a string.

If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.

To my reading, that suggests that at most size bytes should be read, not returned.

In particular, I believe that the correct semantics of the above example should be:

In [11]: f.read(8)
Out[11]: 'Line1\nL' # return a string of length *7*

Am I misunderstanding the documentation?

like image 248
sapi Avatar asked Jun 28 '14 11:06

sapi


1 Answers

What are you really trying to do?

If your reason for reading forwards and then seeking backwards is that you want to return to a particular point in the file, then use tell() to record where you are. That's easier than keeping track of how many bytes you read.

savepos = f.tell()
f.read(8)
f.seek(savepos)
f.read(8)
like image 159
Colin Phipps Avatar answered Oct 17 '22 18:10

Colin Phipps