I have a file
object which may or may not be open in universal mode. (I can access this mode with file.mode
, if that helps).
I want to deal with this file using the standard io
methods: read
and seek
.
If I open the file in non-universal mode, everything works nicely:
In [1]: f = open('example', 'r')
In [2]: f.read()
Out[2]: 'Line1\r\nLine2\r\n' # uhoh, this file has carriage returns
In [3]: f.seek(0)
In [4]: f.read(8)
Out[4]: 'Line1\r\nL'
In [5]: f.seek(-8, 1)
In [6]: f.read(8)
Out[6]: 'Line1\r\nL' # as expected, this is the same as before
In [7]: f.close()
However, if I open the file in universal mode, we have a problem:
In [8]: f = open('example', 'rU')
In [9]: f.read()
Out[9]: 'Line1\nLine2\n' # no carriage returns - thanks, 'U'!
In [10]: f.seek(0)
In [11]: f.read(8)
Out[11]: 'Line1\nLi'
In [12]: f.seek(-8, 1)
In [13]: f.read(8)
Out[13]: 'ine1\nLin' # NOT the same output, as what we read as '\n' was *2* bytes
Python interprets the \r\n
as a \n
, and returns a string of length 8.
However, creating this string involved reading 9 bytes from the file.
As a result, when trying to reverse the read
using seek
, we don't get back to where we started!
Is there a way to identify that we consumed a 2-byte newline or, better yet, disable this behaviour?
The best I can come up with at the moment is to do a tell
before and after the read, and check how much we actually got, but that seems incredibly inelegant.
As an aside, it seems to me that this behaviour is actually contrary to the documentation of read
:
In [54]: f.read?
Type: builtin_function_or_method
String Form:<built-in method read of file object at 0x1a35f60>
Docstring:
read([size]) -> read at most size bytes, returned as a string.
If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
To my reading, that suggests that at most size bytes should be read, not returned.
In particular, I believe that the correct semantics of the above example should be:
In [11]: f.read(8)
Out[11]: 'Line1\nL' # return a string of length *7*
Am I misunderstanding the documentation?
What are you really trying to do?
If your reason for reading forwards and then seeking backwards is that you want to return to a particular point in the file, then use tell() to record where you are. That's easier than keeping track of how many bytes you read.
savepos = f.tell()
f.read(8)
f.seek(savepos)
f.read(8)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With