Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python file.tell() giving strange numbers?

I am using Python 3.3.0, on windows 64bit.

I have a text file as shown below: (see bottom for download link at mediafire)

hello

-data1:blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah


-data2:blah blah blah blah blah blah blah blah blah blah blah
-data3: Empty

-data4: Empty

I'm trying to navigate around the file, and thus I use .tell() to figure out what my position is. However, when reading through the lines of the file as shown below, I get a very strange result:

f=open("test.txt")
while True:
    a = f.readline()
    print("{}    {}".format(repr(a),f.tell()))
    if a == "":
        break

The result:

'hello\n'    7
'\n'    9
'-data1:blah blah blah blah blah blah blah blah blah blah blah blah blah blah bl
ah blah\n'    18446744073709551714
'\n'    99
'\n'    101
'-data2:blah blah blah blah blah blah blah blah blah blah blah\n'    164
'-data3: Empty\n'    179
'\n'    181
'-data4: Empty'    194
''    194

What's with the 18446744073709551714 for the 3rd line? Though it looks like an impossible value, f.seek(18446744073709551714) is an acceptable value that apparently does bring me to the end of the 3rd line. Though, I can't seem to figure out why.

EDIT: Opening in binary mode gives no problems with tell():

f=open("test.txt","rb")
while True:
    a = f.readline()
    print("{}    {}".format(repr(a),f.tell()))
    if a == b"":
        break

The result:

b'hello\r\n'    7
b'\r\n'    9
b'-data1:blah blah blah blah blah blah blah blah blah blah blah blah blah blah b
lah blah\r\n'    97
b'\r\n'    99
b'\r\n'    101
b'-data2:blah blah blah blah blah blah blah blah blah blah blah\r\n'    164
b'-data3: Empty\r\n'    179
b'\r\n'    181
b'-data4: Empty'    194
b''    194

The test.txt text file is downloadable here, just a tiny 194 bytes: http://www.mediafire.com/?1wm4lujb2j48y23

like image 812
Eric Avatar asked Apr 10 '13 19:04

Eric


People also ask

What does tell () do in Python?

The tell() method returns the current file position in a file stream.

What is an opaque number in Python?

In computer science, an opaque data type is a data type whose concrete data structure is not defined in an interface. This enforces information hiding, since its values can only be manipulated by calling subroutines that have access to the missing information.

How do you check if there is any problem in reading the file in Python?

If you want to check if a file can be read, then you can use the readable() method. This will return a True or False . The read() method is going to read all of the content of the file as one string. Once you are done reading a file, it is important that you close it.

How to get file cursor position in Python?

Python file method tell() returns the current position of the file read/write pointer within the file.


1 Answers

It's a documented behaviour caused by UNIX-style line endings:

file.tell()

Return the file’s current position, like stdio's ftell().

Note: On Windows, tell() can return illegal values (after an fgets()) when reading files with Unix-style line-endings. Use binary mode ('rb') to circumvent this problem.


The above documentation is taken from the python2.7.4 documentation. The documentation for python3 changed a bit, since there is now a hierarchy of classes that handle I/O and I can't find this bit of information. Your test shows that the behaviour didn't change anyway. Also the source code for python3.3 has an XXX Windows support below is likely incomplete comment before the function called by tell.


There is an issue in python bug tracker related to this, and the final comment by Catalin Iacob is:

I tried to reproduce this, picked a file on my disk and indeed I got a negative number, but that file has Unix line endings. This is documented at http://docs.python.org/2/library/stdtypes.html#file.tell so probably there's nothing to do then.

As for Armin's report in msg180145, even though it's not intuitive, this matches ftell's behavior on Windows, as documented in the Remarks section of http://msdn.microsoft.com/en-us/library/0ys3hc0b%28v=vs.100%29.aspx. The tell() method on fileobjects is explicitly documented as matching ftell behavior: "Return the file’s current position, like stdio‘s ftell()". So even though it's not intuitive at all, it's probably better to leave it as is. tell() returns the intuitive non zero position when opening with 'a' on Python3 and on Python 2.7 when using io.open so it's fixed for the future anyway.

So it seems like a "wontfix" bug. Someone should probably open an issue(commented the issue) because this fact is not mentioned at all in python3 documentation.


According to Antoine Pitrou python3 doesn't use ftell() at all, hence this seems to be a different bug. Also the bug is not reproducible in python3.2.3 and was probably introduced when fixing this issue (at least, it's the only change I can find to the implementation of tell() between 3.2.3 and 3.3)


Last edit: According to the io module documentation the tell method does not return the number of bytes since the beginning of a file. The returned value is an "opaque number", which means that the only way you can use it is to pass it to seek to get back at that position. Other operations aren't meaningful. The fact that until python3.2.3 the value returned was what you'd expect was only an implementation detail.

Note that the information in this section of the documentation is simply wrong and, hopefully, it will be fixed in the future.

like image 112
Bakuriu Avatar answered Oct 26 '22 08:10

Bakuriu