Inconsistent behavior of fscanf() across different compilers (consuming trailing null character)

Tags:

I wrote a complete application in C99 and tested it thoroughly on two GNU/Linux-based systems. I was surprised when an attempt to compile it using Visual Studio on Windows resulted in the application misbehaving. At first I couldn't assert what was wrong, but I tried using the VC debugger, and then I discovered a discrepancy concerning the fscanf() function declared in stdio.h.

The following code is sufficient to demonstrate the problem:

#include <stdio.h>

int main() {
    unsigned num1, num2, num3;

    FILE *file = fopen("file.bin", "rb");
    fscanf(file, "%u", &num1);
    fgetc(file); // consume and discard \0
    fscanf(file, "%u", &num2);
    fgetc(file); // ditto
    fscanf(file, "%u", &num3);
    fgetc(file); // ditto
    fclose(file);

    printf("%d, %d, %d\n", num1, num2, num3);

    return 0;
}

Assume that file.bin contains exactly 512\0256\0128\0:

$ hexdump -C file.bin
00000000  35 31 32 00 32 35 36 00  31 32 38 00              |512.256.128.|

Now, when being compiled under GCC 4.8.4 on an Ubuntu machine, the resulting program reads the numbers as expected and prints 512, 256, 128 to stdout.
Compiling it with MinGW 4.8.1 on Windows gives the same, expected result.

However, there seems to be a major difference when I compile the code using Visual Studio Community 2015; namely, the output is:

512, 56, 28

As you can see, the trailing null characters have already been consumed by fscanf(), so fgetc() captures and discards characters that are essential to data integrity.

Commenting out the fgetc() lines makes the code work in VC, but breaks it in GCC (and possibly other compilers).

What is going on here, and how do I turn this into portable C code? Have I hit undefined behavior? Note that I'm assuming the C99 standard.

548

asked Feb 23 '17 16:02

rhino

2 Answers

TL;DR: you've been bitten by MSVC non-conformance, a longstanding problem that MS has never shown much interest in solving. If you must support MSVC in addition to conforming C implementations, then one way to do so would be to engage conditional compilation directives to suppress the fgetc() calls when the program is compiled via MSVC.

I'm inclined to agree with the comments that reading binary data via formatted I/O functions is a questionable plan. Even more questionable, however, is the combination of

compil[ing] it using Visual Studio on Windows

and

assuming the C99 standard.

As far as I am aware, no version of MSVC conforms to C99. Very recent versions may do a better job of conforming to C2011, in part because C2011 makes some features optional that were mandatory in C99.

Whichever version of MSVC you're using, however, I think it fails to conform with the standard (both C99 and C2011) in this area. Here is the relevant text from C99, section 7.19.6.2

A conversion specification is executed in the following steps:

[...]

An input item is read from the stream [...]. An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence. The first character, if any, after the input item remains unread.

The standard is quite clear that the first character that does not match the input sequence remains unread, so the only ways MSVC could be considered conforming is if the \0 characters could be construed as being part of (and terminating) a matching input sequence, or if fgetc() were permitted to skip \0 characters. I see no justification for the latter, especially given that the stream was opened in binary mode, so let's consider the former.

For a u conversion specifier, a matching input sequence is defined as one that

Matches an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 10 for the base argument.

The "subject sequence of the strtoul function" is defined in that function's specifications:

First, they decompose the input string into three parts: an initial, possibly empty, sequence of white-space characters (as specified by the isspace function), a subject sequence resembling an integer represented in some radix determined by the value of base, and a final string of one or more unrecognized characters, including the terminating null character of the input string.

Note in particular that the terminating null character is explicitly attributed to the final string of unrecognized characters. It is not part of the subject string, and therefore should not be matched by fscanf() when it converts input according to a u specifier.

180

answered Sep 27 '22 18:09

John Bollinger

The MSVC implementation of fscanf is apparently "trashing" the NUL character next to the 512:

fscanf(file, "%u", &num1);

According to the fscanf documentation, this should not take place (emphasis mine):

For every conversion specifier other than n, the longest sequence of input characters which does not exceed any speciﬁed ﬁeld width and which either is exactly what the conversion specifier expects or is a prefix of a sequence it would expect, is what's consumed from the stream. The ﬁrst character, if any, after this consumed sequence remains unread.

Note that this is different than the situation when one would desire to skip trailing white characters as in following statement:

fscanf(file, "%u ", &num1); // notice "%u "

The spec says, that this occurs, only when the characters are identified by isspace property, which as checked, is not holding here (that is, isspace('\0') yields 0).

A hacky, regex-like workaround, that works in both MSVC and GCC may be to replace fgetc with:

fscanf(file, "%*1[^0-9+-]"); // skip at most one non-%u character

or more portably by replacing implementation-defined 0-9 character class with literal digits:

fscanf(file, "%*1[^0123456789+-]"); // skip at most one non-%u character

answered Sep 27 '22 17:09

Grzegorz Szpetkowski

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Inconsistent behavior of fscanf() across different compilers (consuming trailing null character)

Tags:

c

stdio

scanf

null-character

rhino

People also ask

2 Answers

John Bollinger

Grzegorz Szpetkowski

Recent Activity

Donate For Us

Inconsistent behavior of fscanf() across different compilers (consuming trailing null character)

Tags:

c

stdio

scanf

null-character

rhino

People also ask

2 Answers

John Bollinger

Grzegorz Szpetkowski

Related questions

Recent Activity

Donate For Us