The real problem with scanf has a completely different nature, even though it is also about overflow. When scanf function is used for converting decimal representations of numbers into values of arithmetic types, it provides no protection from arithmetic overflow. If overflow happens, scanf produces undefined behavior.
In C programming, scanf() is one of the commonly used function to take input from the user. The scanf() function reads formatted input from the standard input such as keyboards.
The most common ways of reading input are:
using fgets
with a fixed size, which is what is usually suggested, and
using fgetc
, which may be useful if you're only reading a single char
.
To convert the input, there are a variety of functions that you can use:
strtoll
, to convert a string into an integer
strtof
/d
/ld
, to convert a string into a floating-point number
sscanf
, which is not as bad as simply using scanf
, although it does have most of the downfalls mentioned below
There are no good ways to parse a delimiter-separated input in plain ANSI C. Either use strtok_r
from POSIX or strtok
, which is not thread-safe. You could also roll your own thread-safe variant using strcspn
and strspn
, as strtok_r
doesn't involve any special OS support.
It may be overkill, but you can use lexers and parsers (flex
and bison
being the most common examples).
No conversion, simply just use the string
Since I didn't go into exactly why scanf
is bad in my question, I'll elaborate:
With the conversion specifiers %[...]
and %c
, scanf
does not eat up whitespace. This is apparently not widely known, as evidenced by the many duplicates of this question.
There is some confusion about when to use the unary &
operator when referring to scanf
's arguments (specifically with strings).
It's very easy to ignore the return value from scanf
. This could easily cause undefined behavior from reading an uninitialized variable.
It's very easy to forget to prevent buffer overflow in scanf
. scanf("%s", str)
is just as bad as, if not worse than, gets
.
You cannot detect overflow when converting integers with scanf
. In fact, overflow causes undefined behavior in these functions.
fgets
is for getting the input. sscanf
is for parsing it afterwards. scanf
tries to do both at the same time. That's a recipe for trouble. Read first and parse later.
scanf
bad?The main problem is that scanf
was never intended to deal with user input. It's intended to be used with "perfectly" formatted data. I quoted the word "perfectly" because it's not completely true. But it is not designed to parse data that are as unreliable as user input. By nature, user input is not predictable. Users misunderstands instructions, makes typos, accidentally press enter before they are done etc. One might reasonably ask why a function that should not be used for user input reads from stdin
. If you are an experienced *nix user the explanation will not come as a surprise but it might confuse Windows users. In *nix systems, it is very common to build programs that work via piping, which means that you send the output of one program to another by piping the stdout
of the first program to the stdin
of the second. This way, you can make sure that the output and input are predictable. During these circumstances, scanf
actually works well. But when working with unpredictable input, you risk all sorts of trouble.
So why aren't there any easy-to-use standard functions for user input? One can only guess here, but I assume that old hardcore C hackers simply thought that the existing functions were good enough, even though they are very clunky. Also, when you look at typical terminal applications they very rarely read user input from stdin
. Most often you pass all the user input as command line arguments. Sure, there are exceptions, but for most applications, user input is a very minor thing.
First of all, gets
is NOT an alternative. It's dangerous and should NEVER be used. Read here why: Why is the gets function so dangerous that it should not be used?
My favorite is fgets
in combination with sscanf
. I once wrote an answer about that, but I will re-post the complete code. Here is an example with decent (but not perfect) error checking and parsing. It's good enough for debugging purposes.
Note
I don't particularly like asking the user to input two different things on one single line. I only do that when they belong to each other in a natural way. Like for instance
printf("Enter the price in the format <dollars>.<cent>: "); fgets(buffer, bsize, stdin);
and then usesscanf(buffer "%d.%d", &dollar, ¢)
. I would never do something likeprintf("Enter height and base of the triangle: ")
. The main point of usingfgets
below is to encapsulate the inputs to ensure that one input does not affect the next.
#define bsize 100
void error_function(const char *buffer, int no_conversions) {
fprintf(stderr, "An error occurred. You entered:\n%s\n", buffer);
fprintf(stderr, "%d successful conversions", no_conversions);
exit(EXIT_FAILURE);
}
char c, buffer[bsize];
int x,y;
float f, g;
int r;
printf("Enter two integers: ");
fflush(stdout); // Make sure that the printf is executed before reading
if(! fgets(buffer, bsize, stdin)) error_function(buffer, 0);
if((r = sscanf(buffer, "%d%d", &x, &y)) != 2) error_function(buffer, r);
// Unless the input buffer was to small we can be sure that stdin is empty
// when we come here.
printf("Enter two floats: ");
fflush(stdout);
if(! fgets(buffer, bsize, stdin)) error_function(buffer, 0);
if((r = sscanf(buffer, "%f%f", &f, &g)) != 2) error_function(buffer, r);
// Reading single characters can be especially tricky if the input buffer
// is not emptied before. But since we're using fgets, we're safe.
printf("Enter a char: ");
fflush(stdout);
if(! fgets(buffer, bsize, stdin)) error_function(buffer, 0);
if((r = sscanf(buffer, "%c", &c)) != 1) error_function(buffer, r);
printf("You entered %d %d %f %c\n", x, y, f, c);
If you do a lot of these, I could recommend creating a wrapper that always flushes:
int printfflush (const char *format, ...) { va_list arg; int done; va_start (arg, format); done = vfprintf (stdout, format, arg); fflush(stdout); va_end (arg); return done; }
Doing like this will eliminate a common problem, which is the trailing newline that can mess with the nest input. But it has another issue, which is if the line is longer than bsize
. You can check that with if(buffer[strlen(buffer)-1] != '\n')
. If you want to remove the newline, you can do that with buffer[strcspn(buffer, "\n")] = 0
.
In general, I would advise to not expect the user to enter input in some weird format that you should parse to different variables. If you want to assign the variables height
and width
, don't ask for both at the same time. Allow the user to press enter between them. Also, this approach is very natural in one sense. You will never get the input from stdin
until you hit enter, so why not always read the whole line? Of course this can still lead to issues if the line is longer than the buffer. Did I remember to mention that user input is clunky in C? :)
To avoid problems with lines longer than the buffer you can use a function that automatically allocates a buffer of appropriate size, you can use getline()
. The drawback is that you will need to free
the result afterwards. This function is not guaranteed to exist by the standard, but POSIX has it. You could also implement your own, or find one on SO. How can I read an input string of unknown length?
If you're serious about creating programs in C with user input, I would recommend having a look at a library like ncurses
. Because then you likely also want to create applications with some terminal graphics. Unfortunately, you will lose some portability if you do that, but it gives you far better control of user input. For instance, it gives you the ability to read a key press instantly instead of waiting for the user to press enter.
Here is a rant about scanf
: https://web.archive.org/web/20201112034702/http://sekrit.de/webdocs/c/beginners-guide-away-from-scanf.html
scanf
is awesome when you know your input is always well-structured and well-behaved. Otherwise...
IMO, here are the biggest problems with scanf
:
Risk of buffer overflow - if you do not specify a field width for the %s
and %[
conversion specifiers, you risk a buffer overflow (trying to read more input than a buffer is sized to hold). Unfortunately, there's no good way to specify that as an argument (as with printf
) - you have to either hardcode it as part of the conversion specifier or do some macro shenanigans.
Accepts inputs that should be rejected - If you're reading an input with the %d
conversion specifier and you type something like 12w4
, you would expect scanf
to reject that input, but it doesn't - it successfully converts and assigns the 12
, leaving w4
in the input stream to foul up the next read.
So, what should you use instead?
I usually recommend reading all interactive input as text using fgets
- it allows you to specify a maximum number of characters to read at a time, so you can easily prevent buffer overflow:
char input[100];
if ( !fgets( input, sizeof input, stdin ) )
{
// error reading from input stream, handle as appropriate
}
else
{
// process input buffer
}
One quirk of fgets
is that it will store the trailing newline in the buffer if there's room, so you can do an easy check to see if someone typed in more input than you were expecting:
char *newline = strchr( input, '\n' );
if ( !newline )
{
// input longer than we expected
}
How you deal with that is up to you - you can either reject the whole input out of hand, and slurp up any remaining input with getchar
:
while ( getchar() != '\n' )
; // empty loop
Or you can process the input you got so far and read again. It depends on the problem you're trying to solve.
To tokenize the input (split it up based on one or more delimiters), you can use strtok
, but beware - strtok
modifies its input (it overwrites delimiters with the string terminator), and you can't preserve its state (i.e., you can't partially tokenize one string, then start to tokenize another, then pick up where you left off in the original string). There's a variant, strtok_s
, that preserves the state of the tokenizer, but AFAIK its implementation is optional (you'll need to check that __STDC_LIB_EXT1__
is defined to see if it's available).
Once you've tokenized your input, if you need to convert strings to numbers (i.e., "1234"
=> 1234
), you have options. strtol
and strtod
will convert string representations of integers and real numbers to their respective types. They also allow you to catch the 12w4
issue I mentioned above - one of their arguments is a pointer to the first character not converted in the string:
char *text = "12w4";
char *chk;
long val;
long tmp = strtol( text, &chk, 10 );
if ( !isspace( *chk ) && *chk != 0 )
// input is not a valid integer string, reject the entire input
else
val = tmp;
What can I use to parse input instead of scanf?
Instead of scanf(some_format, ...)
, consider fgets()
with sscanf(buffer, some_format_and %n, ...)
By using " %n"
, code can simply detect if all the format was successfully scanned and that no extra non-white-space junk was at the end.
// scanf("%d %f fred", &some_int, &some_float);
#define EXPECTED_LINE_MAX 100
char buffer[EXPECTED_LINE_MAX * 2]; // Suggest 2x, no real need to be stingy.
if (fgets(buffer, sizeof buffer, stdin)) {
int n = 0;
// add -------------> " %n"
sscanf(buffer, "%d %f fred %n", &some_int, &some_float, &n);
// Did scan complete, and to the end?
if (n > 0 && buffer[n] == '\0') {
// success, use `some_int, some_float`
} else {
; // Report bad input and handle desired.
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With