Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

isdigit(c) - a char or int type?

Tags:

c++

I have written the following code to test if the given input is a digit or not.

#include<iostream>
#include<ctype.h>
#include<stdio.h>
using namespace std;

main()
{
    char c;

    cout<<"Please enter a digit: ";
    cin>>c;

    if(isdigit(c)) //int isdigit(int c) or char isdigit(char c)
    {
        cout<<"You entered a digit"<<endl;
    }
    else
    {
        cout<<"You entered a non-digit value"<<endl;
    }
}      

My question is: what should be the input variable type? char or int?

like image 304
Omama Khan Avatar asked Jul 10 '17 04:07

Omama Khan


1 Answers

The situation is unfortunately a bit more complex than has been told by the other answers.

First of all: the first part of your code is correct (disregarding multiple-byte encodings); if you want to read a single char with cin, you'll have to use a char variable with >> operator.

Now, about isdigit: why does it take an int instead of a char?

It all comes from C; isdigit and its companion were born to be used along with functions like getchar(), which read a character from the stream and return an int. This in turn was done to provide the character and an error code: getchar() can return EOF (which is defined as some implementation-defined negative constant) through its return code to signify that the input stream has ended.

So, the basic idea is: negative = error code; positive = actual character code.

Unfortunately, this poses interoperability problems with "regular" chars.

Short digression: char ultimately is just an integral type with a very small range, but a particularly stupid one. In most occasions - when working with bytes or character codes - you'd want it to be unsigned by default; OTOH, for coherency reasons with other integral types (int, short, long, ...), you may say that the right thing would be that plain char should be signed. The Standard chose the most stupid way: plain char is either signed or unsigned, depending from whatever the implementor of the compiler decides1.

So, you have to be prepared for char being either signed or unsigned; in most implementations it's signed by default, which poses a problem with the getchar() arrangement above.

If char is used to read bytes and is signed it means that all bytes with the high bit set (AKA bytes that, read with an unsigned 8-bit type would be >127) turn out to be negative values. This obviously isn't compatible with the getchar() using negative values for EOF - there could be overlap between actual "negative" characters and EOF.

So, when C functions talk about receiving/providing characters into int variables the contract is always that the character is assumed to be a char that has been cast to an unsigned char (so that it is always positive, negative values overflowing into the top half of its range) and then put into an int. Which brings us back to the isdigit function, which, along its companion functions, has this contract as well:

The header <ctype.h> declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

(C99, §7.4, ¶1)

So, long story short: your if should be at the very least:

if(isdigit((unsigned char)c))

The problem is not just a theoretical one: several widespread C library implementations use the provided value straight as an index into a lookup table, so negative values will read into unallocated memory and segfault your program.

Also, you are not taking into account the fact that the stream may be closed, and thus >> returning without touching your variable (which will be at an uninitialized value); to take this into account, you should check if the stream is still in a valid state before working on c.


  1. Of course this is a bit of an unfair rant; as @Pete Becker noted in the comment below, it's not like they were all morons, but just that the standard mostly tried to be compatible with existing implementations, which were probably evenly split between unsigned and signed char. Traces of this split can be found in most modern compilers, which can generally change the signedness of char through command line options (-fsigned-char/-funsigned-char for gcc/clang, /J in VC++).
like image 185
Matteo Italia Avatar answered Nov 03 '22 04:11

Matteo Italia