Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does isspace() accept getchar() values?

isspace() works if the input is representable as unsigned char or equal to EOF.

getchar() reads the next character from stdin.

When getchar()!=EOF; are all getchar() returned values representable as unsigned char?

uintmax_t count_space = 0;
for (int c; (c = getchar()) != EOF; )
  if (isspace(c))
    ++count_space;

May this code lead to the undefined behavior?

like image 506
jfs Avatar asked Dec 18 '22 03:12

jfs


2 Answers

According to C11 WG14 draft version N1570:

§7.21.7.6/2 The getchar function is equivalent to getc with the argument stdin.

§7.21.7.5/2 The getc function is equivalent to fgetc...

§7.21.7.1/2 [!=EOF case] ...the fgetc function obtains that character as an unsigned char converted to an int...text in [...] is mine.

i.e.,

  • isspace() accepts getchar() values
  • all getchar()!=EOF values are representable as unsigned char
  • there is no undefined behavior here.

If you think it is too obvious ("what else can it be"), think again. For example, in the related case: isspace(CHAR_MIN) may be undefined i.e., it may be undefined behavior to pass a character to a character classification function!

If UCHAR_MAX > INT_MAX the result may be implementation-defined:

§6.3.1.3/3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

like image 73
jfs Avatar answered Dec 28 '22 23:12

jfs


The return value of getchar() is of the same format as fgetc(). C11 defines the return value of fgetc() in 7.21.7.1p2-3:

  1. If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).

Returns

  1. If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end- of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF. [289]

Since this is an unsigned char converted to an int, the int will almost always have the same value as the unsigned char.

It might not be true for high values on some platforms where sizeof(int) == 1; these however are mostly DSP platforms, so it is almost certain that character classification is not needed on these platforms.


The is* functions are carefully defined so that they can be used directly with the return value of *getc* C11 7.4p1:

1 The header <ctype.h> declares several functions useful for classifying and mapping characters. [198] In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

i.e. it is legal to even pass EOF to the is* functions. Of course isanything(EOF) will always return 0, therefore to count continuous whitespace characters one could simply use something like:

while (isspace(getchar())) space_count ++;

However, signed char values are not OK, and for example MSVC C debug library is known to abort if a negative value other than EOF is passed in to any of the character classification functions.