Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When does scanf start and stop scanning?

Tags:

c

io

buffer

scanf

It seems scanf begins scanning the input when the Enter key is pressed, and I want to verify this with the code below (I eliminated error checking and handling for simplicity).

#include <stdio.h>

int main(int argc, char **argv) {
    /* disable buffering */
    setvbuf(stdin, NULL, _IONBF, 0);
    int number;

    scanf("%d", &number);
    printf("number: %d\n", number);

    return 0;
}

Here comes another problem, after I disable input buffering (just to verify the result; I know I should next-to-never do that in reality in case it interferes the results), the output is (note the extra prompt):

$ ./ionbf
12(space)(enter)
number: 12
$
$

which is different from the output when input buffering is enabled (no extra prompt):

$ ./iofbf
12(space)(enter)
number: 12
$

It seems the new line character is consumed when buffer enabled. I tested on two different machines, one with gcc 4.1.2 and bash 3.2.25 installed, the other with gcc 4.4.4 and bash 4.1.5, and the result is the same on both.

The problems are:

  1. How to explain the different behaviors when input buffering is enabled and disabled?
  2. Back to the original problem, when does scanf begin scanning user input? The moment a character is entered? Or is it buffered until a line completes?
like image 998
Summer_More_More_Tea Avatar asked Oct 17 '12 02:10

Summer_More_More_Tea


1 Answers

Interesting question — long-winded answer. In case of doubt, I'm describing what I think happens on Unix; I leave Windows to other people. I think the behaviour would be similar, but I'm not sure.

When you use setvbuf(stdin, NULL, _IONBF, 0), you force the stdin stream to read one character at a time using the read(0, buffer, 1) system call. When you run with _IOFBF or _IOLBF, then the code managing the stream will attempt to read many more bytes at a time (up to the size of the buffer you provide if you use setvbuf(), or BUFSIZ if you don't). These observations plus the space in your input are key to explaining what happens. I'm assuming your terminal is in normal or canonical input mode — see Canonical vs non-canonical terminal input for a discussion of that.

You are correct that the terminal driver does not make any characters available until you type return. This allows you to use backspace etc to edit the line as you type it.

When you hit return, the kernel has 4 characters available to send to any program that wants to read them: 1 2 space return.

In the case where you are not using _IONBF, those 4 characters are all read at once into the standard I/O buffer for stdin by a call such as read(0, buffer, BUFSIZ). The scanf() then collects the 1, the 2 and the space characters from the buffer, and puts back the space into the buffer. (Note that the kernel has passed all four characters to the program.) The program prints its output and exits. The shell resumes, prints a prompt and waits for some more input to be available — but there won't be any input available until the user types another return, possibly (usually) preceded by some other characters.

In the case where you are using _IONBF, the program reads the characters one at a time. It makes a read() call to get one character and gets the 1; it makes another read() call and gets the 2; it makes another read() call and gets the space character. (Note that the kernel still has the return ready and waiting.) It doesn't need the space to interpret the number, so it puts it back in its pushback buffer (there is guaranteed to be space for at least one byte in the pushback buffer), ready for the next standard I/O read operation, and returns. The program prints its output and exits. The shell resumes, prints a prompt, and tries to read a new command from the terminal. The kernel obliges by returning the newline that is waiting, and the shell says "Oh, that's an empty command" and gives you another prompt.

You can demonstrate this is what happens by typing 1 2 x p s return to your (_IONBF) program. When you do that, your program reads the value 12 and the 'x', leaving 'ps' and the newline to be read by the shell, which will then execute the ps command (without echoing the characters that it read), and then prompt again.

You could also use truss or strace or a similar command to track the system calls that are executed by your program to see the veracity of what I suggest happens.

like image 199
Jonathan Leffler Avatar answered Oct 18 '22 04:10

Jonathan Leffler