Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there cases where fseek/ftell can give the wrong file size?

In C or C++, the following can be used to return a file size:

const unsigned long long at_beg = (unsigned long long) ftell(filePtr);
fseek(filePtr, 0, SEEK_END);
const unsigned long long at_end = (unsigned long long) ftell(filePtr);
const unsigned long long length_in_bytes = at_end - at_beg;
fprintf(stdout, "file size: %llu\n", length_in_bytes);

Are there development environments, compilers, or OSes which can return the wrong file size from this code, based on padding or other information that is situation-specific? Were there changes in the C or C++ specification around 1999, which would have lead to this code no longer working in certain cases?

For this question, please assume I am adding large file support by compiling with the flags -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE=1. Thanks.

like image 818
Alex Reynolds Avatar asked Feb 03 '12 20:02

Alex Reynolds


People also ask

What does Ftell return on error?

Use ftell with fseek to return to file locations correctly. On error, ftell returns –1L.

What will be the return value of fseek () function if error occurs?

The fseek function returns zero if successful. If an error occurs, the fseek function will return a nonzero value.

What is the purpose of fseek () and ftell ()?

ftell() and fseek()ftell() is used to store the current file position. fseek() is used to relocate to one of the following: A file position stored by ftell() A calculated record number ( SEEK_SET )

Is Ftell a byte?

ftell] Prototype: long ftell(FILE *stream); in: Returns: the current file pointer in stream measured in bytes from the beginning. If there is an error, it returns -1L.


3 Answers

It won't work on unseekable files like /proc/cpuinfo or /dev/stdin or /dev/tty, or pipe files gotten with popen

And it won't work if that file is written by another process at the same time.

Using the Posix stat function is probably more efficient and more reliable. Of course, this function might not be available on non Posix systems.

like image 162
Basile Starynkevitch Avatar answered Sep 28 '22 00:09

Basile Starynkevitch


The fseek and ftell functions are both defined by the ISO C language standard.

The following is from latest public draft of the 2011 C standard, but the 1990, 1999, and 2011 ISO C standards are all very similar in this area, if not identical.

7.21.9.4:

The ftell function obtains the current value of the file position indicator for the stream pointed to by stream. For a binary stream, the value is the number of characters from the beginning of the file. For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.

7.21.9.2:

The fseek function sets the file position indicator for the stream pointed to by stream. If a read or write error occurs, the error indicator for the stream is set and fseek fails.

For a binary stream, the new position, measured in characters from the beginning of the file, is obtained by adding offset to the position specified by whence. The specified position is the beginning of the file if whence is SEEK_SET, the current value of the file position indicator if SEEK_CUR, or end-of-file if SEEK_END. A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.

For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET.

Violating any of the "shall" clauses makes your program's behavior undefined.

So if the file was opened in binary mode, ftell gives you the number of characters from the beginning of the file -- but an fseek relative to the end of the file (SEEK_END) is not necessarily meaningful. This accommodates systems that store binary files in whole blocks and don't keep track of how much was written to the final block.

If the file was opened in text mode, you can seek to the beginning or end of the file with an offset of 0, or you can seek to a position given by an earlier call to ftell; fseek with any other arguments has undefined behavior. This accomodates systems where the number of characters read from a text file doesn't necessarily correspond to the number of bytes in the file. For example, on Windows reading a CR-LF pair ("\r\n") reads only one character, but advances 2 bytes in the file.

In practice, on Unix-like systems text and binary modes behave the same way, and the fseek/ftell method will work. I suspect it will work on Windows (my guess is that ftell will give the byte offset, which may not be the same as the number of times you could call getchar() in text mode).

Note also that ftell() returns a result of type long. On systems where long is 32 bits, this method can't work for files that are 2 GiB or larger.

You might be better off using some system-specific method to get the size of a file. Since the fseek/ftell method is system-specific anyway, such as stat() on Unix-like systems.

On the other hand, fseek and ftell are likely to work as you expect on most systems you're likely to encounter. I'm sure there are systems where it won't work; sorry, but I don't have specifics.

If working on Linux and Windows is good enough, and you're not concerned with large files, then the fseek/ftell method is probably ok. Otherwise, you should consider using a system-specific method to determine the size of a file.

And keep in mind that anything that tells you the size of a file can only tell you its size at that moment. The file's size could change before you access it.

like image 33
Keith Thompson Avatar answered Sep 28 '22 00:09

Keith Thompson


1) Superficially, your code looks "OK" - I don't see any problem with it.

2) No - there isn't any "C or C++ specification" that would affect fseek. There is a Posix specification:

  • http://pubs.opengroup.org/onlinepubs/9699919799/functions/fseek.html

3) If you want "file size", my first choice would probably by "stat()". Here's the Posix specification:

  • http://pubs.opengroup.org/onlinepubs/007904975/functions/stat.html

4) If something's "going wrong" with your method, then my first guess would be "large file support".

For example, many OS's had parallel "fseek()" and "fseek64()" APIs.

'Hope that helps .. PSM

like image 42
paulsm4 Avatar answered Sep 28 '22 02:09

paulsm4