Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quick-fixing 32-bit (2GB limited) fseek/ftell on freebsd 7

I have old 32-bit C/C++ program on FreeBSD, which is used remotely by hundreds of users, and author of which will not fix it. It was written in unsafe way, all file offset are stored internally as unsigned 32-bit offsets, and ftell/fseek functions where used. In FreeBSD 7 (the host platform for software), it means that ftell and fseek uses 32-bit signed long:

 int fseek(FILE *stream, long offset, int whence);

 long ftell(FILE *stream);

I need to do quick fix of the program, because some internal data files suddenly hit 2^31 file size (2 147 483 7yy bytes) after 13 years of collecting data, and internal fseek/ftell assert fails now for any request.

In FreeBSD7 world there is fseeko/ftello hack for 2GB+ files.

 int
 fseeko(FILE *stream, off_t offset, int whence);

 off_t
 ftello(FILE *stream);

The off_t type here is not well-defined; all I know now, that it has 8-byte size and looks like long long OR unsigned long long (I don't know which one).

Is it enough (to work with up to 4 GB files) and safe to search-and-replace all ftell to ftello, and all fseek to fseeko (sed -i 's/ftell/ftello', same for seek), if possible usages of them are:

 unsigned long offset1,offset2; //32bit
 offset1 = (compute + it) * in + some - arithmetic;
 fseek(file, 0, SEEK_END);
 fseek(file, 4, SEEK_END); // or other small int constant

 offset2 = ftell(file);
 fseek(file, offset1, SEEK_SET);  // No usage of SEEK_CUR

and combinations of such calls.

What is the signedness of off_t? It is safe to assign 64-bit off_t into unsigned 32-bit offset? Will it work for bytes in range from 2 GB up to 4 GB?

Which functions may be used for working with offset besides ftell/fseek?

like image 383
osgx Avatar asked Jun 29 '14 01:06

osgx


1 Answers

FreeBSD fseeko() and ftello() is documented as POSIX.1-2001 compatible, which means off_t is a signed integer type.

On FreeBSD 7, you can safely do:

off_t          actual_offset;
unsigned long  stored_offset;

if (actual_offset >= (off_t)0 && actual_offset < (off_t)4294967296.0)
    stored_offset = (unsigned long)actual_offset;
else
    some_fatal_error("Unsupportable file offset!");

(On LP64 architectures, the above would be silly, as off_t and long would both be 64-bit signed integers. It would be safe even then; just silly, since all possible file offsets could be supported.)

The thing that people do get bitten by often with this, is that the offset calculations must be done using off_t. That is, it is not enough to cast the result to off_t, you must cast the values used in the arithmetic to off_t. (Technically, you only need to make sure each arithmetic operation is done at off_t precision, but I find it easier to remember the rules if I just punt and cast all the operands.) For example:

off_t          offset;
unsigned long  some, values, used;

offset = (off_t)some * (off_t)value + (off_t)used;
fseeko(file, offset, SEEK_SET);

Usually the offset calculations are used to find a field in a specific record; the arithmetic tends to stay the same. I truly recommend you move the seek operations to a helper function, if possible:

int fseek_to(FILE *const file,
             const unsigned long some,
             const unsigned long values,
             const unsigned long used)
{
    const off_t  offset = (off_t)some * (off_t)value + (off_t)used;
    if (offset < (off_t)0 || offset >= (off_t)4294967296.0)
        fatal_error("Offset exceeds 4GB; I must abort!");
    return fseeko(file, offset, SEEK_SET);
}

Now, if you happen to be in a lucky position where you know all your offsets are aligned (to some integer, say 4), you can give yourself a couple of years of more time to rewrite the application, by using an extension of the above:

#define BIG_N 4

int fseek_to(FILE *const file,
             const unsigned long some,
             const unsigned long values,
             const unsigned long used)
{
    const off_t  offset = (off_t)some * (off_t)value + (off_t)used;
    if (offset < (off_t)0)
        fatal_error("Offset is negative; I must abort!");
    if (offset >= (off_t)(BIG_N * 2147483648.0))
        fatal_error("Offset is too large; I must abort!");
    if ((offset % BIG_N) && (offset >= (off_t)2147483648.0))
        fatal_error("Offset is not a multiple of BIG_N; I must abort!");
    return fseeko(file, offset, SEEK_SET);
}

int fseek_big(FILE *const file, const unsigned long position)
{
    off_t  offset;
    if (position >= 2147483648UL)
        offset = (off_t)2147483648UL
               + (off_t)BIG_N * (off_t)(position - 2147483648UL);
    else
        offset = (off_t)position;
    return fseeko(file, offset, SEEK_SET);
}

unsigned long ftell_big(FILE *const file)
{
    off_t  offset;
    offset = ftello(file);
    if (offset < (off_t)0)
        fatal_error("Offset is negative; I must abort!");
    if (offset < (off_t)2147483648UL)
        return (unsigned long)offset;
    if (offset % BIG_N)
        fatal_error("Offset is not a multiple of BIG_N; I must abort!");
    if (offset >= (off_t)(BIG_N * 2147483648.0))
        fatal_error("Offset is too large; I must abort!");
    return (unsigned long)2147483648UL
         + (unsigned long)((offset - (off_t)2147483648UL) / (off_t)BIG_N);
}

The logic is simple: If offset is less than 231, it is used as-is. Otherwise, it is represented by value 231 + BIG_N × (offset - 231). The only requirement is that offset 231 and above are always multiples of BIG_N.

Obviously, you them must use only the above three functions -- plus whatever variants of fseek_to() you need, as long as they do the same checks, just use different parameters and formula for the offset calculation --, you can support file sizes of up to 2147483648 + BIG_N × 2147483647. For BIG_N==4, that is 10 GiB (less 4 bytes; 10,737,418,236 bytes to be exact).

Questions?


Edited to clarify:

Start with replacing your fseek(file, position, SEEK_SET) with calls to fseek_pos(file, position),

static inline void fseek_pos(FILE *const file, const unsigned long position)
{
    if (fseeko(file, (off_t)position, SEEK_SET))
        fatal_error("Cannot set file position!");
}

and fseek(file, position, SEEK_END) with calls to fseek_end(file, position) (for symmetry -- I'm assuming the position for this one is usually a literal integer constant),

static inline void fseek_end(FILE *const file, const off_t relative)
{
    if (fseeko(file, relative, SEEK_END))
        fatal_error("Cannot set file position!");
}

and finally, ftell(file) with calls to ftell_pos(file):

static inline unsigned long ftell_pos(FILE *const file)
{
    off_t position;
    position = ftello(file);
    if (position == (off_t)-1)
        fatal_error("Lost file position!");
    if (position < (off_t)0 || position >= (off_t)4294967296.0)
        fatal_error("File position outside the 4GB range!");
    return (unsigned long)position;
}

Since on your architecture and OS unsigned long is a 32-bit unsigned integer type and off_t is a 64-bit signed integer type, this gives you the full 4GB range.

For the offset calculations, define one or more functions similar to

static inline void fseek_to(FILE *const file, const off_t term1,
                                              const off_t term2,
                                              const off_t term3)
{
    const off_t position = term1 * term2 + term3;

    if (position < (off_t)0 || position >= (off_t)4294967296.0)
        fatal_error("File position outside the 4GB range!");
    if (fseeko(file, position, SEEK_SET))
        fatal_error("Cannot set file position!");
}

For each offset calculation algorithm, define one fseek_to variant. Name the parameters so that the arithmetic makes sense. Make the parameters const off_t, as above, so you don't need extra casts in the arithmetic. Only the parameters and the const off_t position = line defining the calculation algorithm vary between the variant functions.

Questions?

like image 163
Nominal Animal Avatar answered Sep 18 '22 01:09

Nominal Animal