Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python file I/O file.seek() vs file.read() pointer behavior

Tags:

python

I ran across an issue where I didn't get the expected output based on my understanding.

The difference between the two code snippets is file.seek(3,0) vs file.read(3).

in both cases I verified that the file pointer is at position 3, however when I write to the file I get different results. Can someone explain why this is?

#=====Code Snippet #1===========

# original file has "hi world!"
with open(filename, 'r+') as file:
    file.seek(3,0) # we move the file pointer to index 3
    print(file.tell()) # prints 3
    file.write("friend!")
# file now has "hi friend!"  <--- AS EXPECTED


# ======Code Snippet #2===========

# original file has "hi world!"
with open(filename, 'r+') as file:
    file.read(3) # We read characters at index 0,1,2 and move the file pointer to index 3
    print(file.tell()) # prints 3
    file.write("friend!")
# file now has "hi world!friend!" <--- NOT AS EXPECTED
# =====================================
like image 611
Aleksandr Gontcharov Avatar asked Apr 14 '26 19:04

Aleksandr Gontcharov


2 Answers

You are opening the file in the default "text" mode, by not providing a "b" specifier for binary mode, like in open(file, "rb+"). The text mode is the default one (one could also use the "t" specifier, as in open(file, "rt+")), meaning that underneath all file I/O is text oriented, and reads will do buffered I/O underneath.

So, despite the .tell method telling you a cursor position, from where a read would continue, the O.S. level file pointer is at the end of the file and writing resumes from there.

Yes, it is a plain incorrect behavior, but that is because file-exact position with "seek" and writing at the middle of the file has never been expected to work with files open in the "text" mode - for the single reason the most visible change in this mode is a translation of line-endings (on Windows, where the sequence \r\n - 2 bytes, is translated to a single \n character in a seamless way). So, file positioning in text files is not regarded as deterministic (and the extra buffering for read you had hit, insure it indeed is not).

So, the "workaround" for your issue is just to work with the files in binary mode, as it is the only mode that is actually expected to work with positioning and read/writing bytes at precise positions in the file:


In [51]: filename = "file.txt"

In [52]: open(filename, "wb").write(b"hi world!")
Out[52]: 9


In [54]: # original file has "hi world!"
    ...: with open(filename, 'rb+') as file:
    ...:     file.read(3) # We read characters at index 0,1,2 and move t
    ...: he file pointer to index 3
    ...:     print(file.tell()) # prints 3
    ...:     file.write(b"friend!")
    ...: # file now has "hi world!friend!"
    ...: 
3

In [55]: open(filename).read()
Out[55]: 'hi friend!'

(also note you must write a bytes-object (with the "b" prefix) not text to a file open in binary mode)

like image 98
jsbueno Avatar answered Apr 21 '26 02:04

jsbueno


You need a seek call between reading and writing:

import os
with open(filename, 'r+') as file:
    file.read(3)
    file.seek(0, os.SEEK_CUR)
    file.write("friend!")

Unfortunately, I don't think Python documents this anywhere. C documents a similar requirement, but I think Python's requirement is just completely undocumented.

Note that with variable-width text encodings, trying to write N characters into the middle of a file isn't guaranteed to overwrite N existing characters. If you're not careful, you can easily end up overwriting half a character and corrupting your file.

like image 24
user2357112 supports Monica Avatar answered Apr 21 '26 02:04

user2357112 supports Monica



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!