Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What can I use for input conversion instead of scanf?

Tags:

c

scanf

People also ask

Why should I not use scanf?

The real problem with scanf has a completely different nature, even though it is also about overflow. When scanf function is used for converting decimal representations of numbers into values of arithmetic types, it provides no protection from arithmetic overflow. If overflow happens, scanf produces undefined behavior.

Is it necessary to use scanf in C?

In C programming, scanf() is one of the commonly used function to take input from the user. The scanf() function reads formatted input from the standard input such as keyboards.


The most common ways of reading input are:

  • using fgets with a fixed size, which is what is usually suggested, and

  • using fgetc, which may be useful if you're only reading a single char.

To convert the input, there are a variety of functions that you can use:

  • strtoll, to convert a string into an integer

  • strtof/d/ld, to convert a string into a floating-point number

  • sscanf, which is not as bad as simply using scanf, although it does have most of the downfalls mentioned below

  • There are no good ways to parse a delimiter-separated input in plain ANSI C. Either use strtok_r from POSIX or strtok, which is not thread-safe. You could also roll your own thread-safe variant using strcspn and strspn, as strtok_r doesn't involve any special OS support.

  • It may be overkill, but you can use lexers and parsers (flex and bison being the most common examples).

  • No conversion, simply just use the string


Since I didn't go into exactly why scanf is bad in my question, I'll elaborate:

  • With the conversion specifiers %[...] and %c, scanf does not eat up whitespace. This is apparently not widely known, as evidenced by the many duplicates of this question.

  • There is some confusion about when to use the unary & operator when referring to scanf's arguments (specifically with strings).

  • It's very easy to ignore the return value from scanf. This could easily cause undefined behavior from reading an uninitialized variable.

  • It's very easy to forget to prevent buffer overflow in scanf. scanf("%s", str) is just as bad as, if not worse than, gets.

  • You cannot detect overflow when converting integers with scanf. In fact, overflow causes undefined behavior in these functions.



TL;DR

fgets is for getting the input. sscanf is for parsing it afterwards. scanf tries to do both at the same time. That's a recipe for trouble. Read first and parse later.

Why is scanf bad?

The main problem is that scanf was never intended to deal with user input. It's intended to be used with "perfectly" formatted data. I quoted the word "perfectly" because it's not completely true. But it is not designed to parse data that are as unreliable as user input. By nature, user input is not predictable. Users misunderstands instructions, makes typos, accidentally press enter before they are done etc. One might reasonably ask why a function that should not be used for user input reads from stdin. If you are an experienced *nix user the explanation will not come as a surprise but it might confuse Windows users. In *nix systems, it is very common to build programs that work via piping, which means that you send the output of one program to another by piping the stdout of the first program to the stdin of the second. This way, you can make sure that the output and input are predictable. During these circumstances, scanf actually works well. But when working with unpredictable input, you risk all sorts of trouble.

So why aren't there any easy-to-use standard functions for user input? One can only guess here, but I assume that old hardcore C hackers simply thought that the existing functions were good enough, even though they are very clunky. Also, when you look at typical terminal applications they very rarely read user input from stdin. Most often you pass all the user input as command line arguments. Sure, there are exceptions, but for most applications, user input is a very minor thing.

So what can you do?

First of all, gets is NOT an alternative. It's dangerous and should NEVER be used. Read here why: Why is the gets function so dangerous that it should not be used?

My favorite is fgets in combination with sscanf. I once wrote an answer about that, but I will re-post the complete code. Here is an example with decent (but not perfect) error checking and parsing. It's good enough for debugging purposes.

Note

I don't particularly like asking the user to input two different things on one single line. I only do that when they belong to each other in a natural way. Like for instance printf("Enter the price in the format <dollars>.<cent>: "); fgets(buffer, bsize, stdin); and then use sscanf(buffer "%d.%d", &dollar, &cent). I would never do something like printf("Enter height and base of the triangle: "). The main point of using fgets below is to encapsulate the inputs to ensure that one input does not affect the next.

#define bsize 100

void error_function(const char *buffer, int no_conversions) {
        fprintf(stderr, "An error occurred. You entered:\n%s\n", buffer);
        fprintf(stderr, "%d successful conversions", no_conversions);
        exit(EXIT_FAILURE);
}

char c, buffer[bsize];
int x,y;
float f, g;
int r;

printf("Enter two integers: ");
fflush(stdout); // Make sure that the printf is executed before reading
if(! fgets(buffer, bsize, stdin)) error_function(buffer, 0);
if((r = sscanf(buffer, "%d%d", &x, &y)) != 2) error_function(buffer, r);

// Unless the input buffer was to small we can be sure that stdin is empty
// when we come here.
printf("Enter two floats: ");
fflush(stdout);
if(! fgets(buffer, bsize, stdin)) error_function(buffer, 0);
if((r = sscanf(buffer, "%f%f", &f, &g)) != 2) error_function(buffer, r);

// Reading single characters can be especially tricky if the input buffer
// is not emptied before. But since we're using fgets, we're safe.
printf("Enter a char: ");
fflush(stdout);
if(! fgets(buffer, bsize, stdin)) error_function(buffer, 0);
if((r = sscanf(buffer, "%c", &c)) != 1) error_function(buffer, r);

printf("You entered %d %d %f %c\n", x, y, f, c);

If you do a lot of these, I could recommend creating a wrapper that always flushes:

int printfflush (const char *format, ...)
{
   va_list arg;
   int done;
   va_start (arg, format);
   done = vfprintf (stdout, format, arg);
   fflush(stdout);
   va_end (arg);
   return done;
}

Doing like this will eliminate a common problem, which is the trailing newline that can mess with the nest input. But it has another issue, which is if the line is longer than bsize. You can check that with if(buffer[strlen(buffer)-1] != '\n'). If you want to remove the newline, you can do that with buffer[strcspn(buffer, "\n")] = 0.

In general, I would advise to not expect the user to enter input in some weird format that you should parse to different variables. If you want to assign the variables height and width, don't ask for both at the same time. Allow the user to press enter between them. Also, this approach is very natural in one sense. You will never get the input from stdin until you hit enter, so why not always read the whole line? Of course this can still lead to issues if the line is longer than the buffer. Did I remember to mention that user input is clunky in C? :)

To avoid problems with lines longer than the buffer you can use a function that automatically allocates a buffer of appropriate size, you can use getline(). The drawback is that you will need to free the result afterwards. This function is not guaranteed to exist by the standard, but POSIX has it. You could also implement your own, or find one on SO. How can I read an input string of unknown length?

Stepping up the game

If you're serious about creating programs in C with user input, I would recommend having a look at a library like ncurses. Because then you likely also want to create applications with some terminal graphics. Unfortunately, you will lose some portability if you do that, but it gives you far better control of user input. For instance, it gives you the ability to read a key press instantly instead of waiting for the user to press enter.

Interesting reading

Here is a rant about scanf: https://web.archive.org/web/20201112034702/http://sekrit.de/webdocs/c/beginners-guide-away-from-scanf.html


scanf is awesome when you know your input is always well-structured and well-behaved. Otherwise...

IMO, here are the biggest problems with scanf:

  • Risk of buffer overflow - if you do not specify a field width for the %s and %[ conversion specifiers, you risk a buffer overflow (trying to read more input than a buffer is sized to hold). Unfortunately, there's no good way to specify that as an argument (as with printf) - you have to either hardcode it as part of the conversion specifier or do some macro shenanigans.

  • Accepts inputs that should be rejected - If you're reading an input with the %d conversion specifier and you type something like 12w4, you would expect scanf to reject that input, but it doesn't - it successfully converts and assigns the 12, leaving w4 in the input stream to foul up the next read.

So, what should you use instead?

I usually recommend reading all interactive input as text using fgets - it allows you to specify a maximum number of characters to read at a time, so you can easily prevent buffer overflow:

char input[100];
if ( !fgets( input, sizeof input, stdin ) )
{
  // error reading from input stream, handle as appropriate
}
else
{
  // process input buffer
}

One quirk of fgets is that it will store the trailing newline in the buffer if there's room, so you can do an easy check to see if someone typed in more input than you were expecting:

char *newline = strchr( input, '\n' );
if ( !newline )
{
  // input longer than we expected
}

How you deal with that is up to you - you can either reject the whole input out of hand, and slurp up any remaining input with getchar:

while ( getchar() != '\n' ) 
  ; // empty loop

Or you can process the input you got so far and read again. It depends on the problem you're trying to solve.

To tokenize the input (split it up based on one or more delimiters), you can use strtok, but beware - strtok modifies its input (it overwrites delimiters with the string terminator), and you can't preserve its state (i.e., you can't partially tokenize one string, then start to tokenize another, then pick up where you left off in the original string). There's a variant, strtok_s, that preserves the state of the tokenizer, but AFAIK its implementation is optional (you'll need to check that __STDC_LIB_EXT1__ is defined to see if it's available).

Once you've tokenized your input, if you need to convert strings to numbers (i.e., "1234" => 1234), you have options. strtol and strtod will convert string representations of integers and real numbers to their respective types. They also allow you to catch the 12w4 issue I mentioned above - one of their arguments is a pointer to the first character not converted in the string:

char *text = "12w4";
char *chk;
long val;
long tmp = strtol( text, &chk, 10 );
if ( !isspace( *chk ) && *chk != 0 )
  // input is not a valid integer string, reject the entire input
else
  val = tmp;

What can I use to parse input instead of scanf?

Instead of scanf(some_format, ...), consider fgets() with sscanf(buffer, some_format_and %n, ...)

By using " %n", code can simply detect if all the format was successfully scanned and that no extra non-white-space junk was at the end.

// scanf("%d %f fred", &some_int, &some_float);
#define EXPECTED_LINE_MAX 100
char buffer[EXPECTED_LINE_MAX * 2];  // Suggest 2x, no real need to be stingy.

if (fgets(buffer, sizeof buffer, stdin)) {
  int n = 0;
  // add ------------->    " %n" 
  sscanf(buffer, "%d %f fred %n", &some_int, &some_float, &n);
  // Did scan complete, and to the end?
  if (n > 0 && buffer[n] == '\0') {
    // success, use `some_int, some_float`
  } else {
    ; // Report bad input and handle desired.
  }