Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do ctype functions take int but want unsigned char/EOF?

Tags:

c

I am using gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1

The man page for isalnum() says:

SYNOPSIS
       #include <ctype.h>

       int isalnum(int c);

However, it also says:

These functions check whether c, which must have the value of an unsigned char or EOF, ...

I have found that isalnum() will blow up for very large positive (or negative) int values (but it handles all short int values).

Is the man page saying the int passed in must have a value of an unsigned char because the C library writers are reserving the right to implement isalnum() in a way that will not handle all int values without blowing up?

like image 274
Scooter Avatar asked Jul 24 '12 04:07

Scooter


1 Answers

The C standard says as much...

In ISO/IEC 9899:1999 (the old C standard), it says:

§7.4 Character handling

The header declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

(I've left out a footnote.) Both C89 and C11 say very much the same thing.

One common implementation is to use an array offset by 1 — a variation on the theme of:

int _CtypeBits[257] = { ... };

#define isalpha(c)  (_Ctype_bits[(c)+1]&_ALPHA);

As long as c is in the range of integers that an unsigned char can store (and there are 8 bits per character, EOF is -1, and the initialization is correct), then this works beautifully. Note that the macro expansion only uses the argument once, which is another requirement of the standard. But if you pass random values out the stipulated range, you access random memory (or, at the least, memory that is not initialized to contain the correct information).

like image 177
Jonathan Leffler Avatar answered Oct 06 '22 14:10

Jonathan Leffler