isspace()
works if the input is representable as unsigned char
or equal to EOF
.
getchar()
reads the next character from stdin.
When getchar()!=EOF
; are all getchar()
returned values representable as unsigned char
?
uintmax_t count_space = 0;
for (int c; (c = getchar()) != EOF; )
if (isspace(c))
++count_space;
May this code lead to the undefined behavior?
According to C11 WG14 draft version N1570:
§7.21.7.6/2 The
getchar
function is equivalent togetc
with the argument stdin.§7.21.7.5/2 The
getc
function is equivalent tofgetc
...§7.21.7.1/2 [
!=EOF
case] ...thefgetc
function obtains that character as anunsigned char
converted to anint
...text in [...] is mine.
i.e.,
isspace()
accepts getchar()
valuesgetchar()!=EOF
values are representable as unsigned char
If you think it is too obvious ("what else can it be"), think again. For example, in the related case: isspace(CHAR_MIN)
may be undefined i.e., it may be undefined behavior to pass a character to a character classification function!
If UCHAR_MAX > INT_MAX
the result may be implementation-defined:
§6.3.1.3/3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
The return value of getchar()
is of the same format as fgetc()
. C11 defines the return value of fgetc()
in 7.21.7.1p2-3:
- If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the
fgetc
function obtains that character as anunsigned char
converted to anint
and advances the associated file position indicator for the stream (if defined).Returns
- If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end- of-file indicator for the stream is set and the
fgetc
function returnsEOF
. Otherwise, thefgetc
function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and thefgetc
function returnsEOF
. [289]
Since this is an unsigned char
converted to an int
, the int
will almost always have the same value as the unsigned char.
It might not be true for high values on some platforms where sizeof(int) == 1
; these however are mostly DSP platforms, so it is almost certain that character classification is not needed on these platforms.
The is*
functions are carefully defined so that they can be used directly with the return value of *getc*
C11 7.4p1:
1 The header
<ctype.h>
declares several functions useful for classifying and mapping characters. [198] In all cases the argument is anint
, the value of which shall be representable as anunsigned char
or shall equal the value of the macroEOF
. If the argument has any other value, the behavior is undefined.
i.e. it is legal to even pass EOF
to the is*
functions. Of course isanything(EOF)
will always return 0, therefore to count continuous whitespace characters one could simply use something like:
while (isspace(getchar())) space_count ++;
However, signed char values are not OK, and for example MSVC C debug library is known to abort if a negative value other than EOF
is passed in to any of the character classification functions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With