Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

getchar() and stdin

Tags:

c

A related question is here, but my question is different.

But, I'd like to know more about the internals of getchar() and stdin. I know that getchar() just ultimately calls fgetc(stdin).

My question is about buffering, stdin and getchar() behavior. Given the classic K&R example:

#include <stdio.h>

main()
{
    int c;

    c = getchar();
    while (c != EOF) {
        putchar(c);
        c = getchar();
    }
}

It seems to me that getchar()'s behavior could be described as follows:

If there's nothing in the stdin buffer, let the OS accept user input until [enter] is pressed. Then return the first character in the buffer.

Assume the program is run and the user types "anchovies."

So, in the above code listing, the first call to getchar() awaits user input and assigns the first character in the buffer to variable c. Inside the loop, the first iteration's call to getchar() says "Hey, there's stuff in the buffer, return the next character in the buffer." But the Nth iteration of the while loop results in getchar() saying "Hey, there's nothing in the buffer, so let stdin gather what the user types.

I've spend a little time with the c source, but it seems this is more of a behavioral artifact of stdin rather than fgetc().

Am I wrong here? Thanks for your insight.

like image 287
ybakos Avatar asked Oct 12 '11 14:10

ybakos


People also ask

Is Getchar a stdin?

getchar is a function in C programming language that reads a single character from the standard input stream stdin, regardless of what it is, and returns it to the program. It is specified in ANSI-C and is the most basic input function in C. It is included in the stdio. h header file.

How do I get char from stdin?

C library function - getchar() The C library function int getchar(void) gets a character (an unsigned char) from stdin. This is equivalent to getc with stdin as its argument.

What is the use of getchar ()?

The getchar function is part of the <stdio. h> header file in C. It is used when single character input is required from the user. The function reads the input as an unsigned char ; then it casts and returns as an int or an EOF .

What is the meaning of stdin?

Short for standard input, stdin is an input stream where data is sent to and read by a program. It is a file descriptor in Unix-like operating systems, and programming languages, such as C, Perl, and Java.


3 Answers

The behaviour you're observing has nothing to do with C and getchar(), but with the teletype (TTY) subsystem in the OS kernel.

For this you need to know how processes get their input from your keyboard and how they write their output to your terminal window (I assume you use UNIX and the following explanations apply specifically to UNIX, i.e. Linux, macOS, etc.):

enter image description here

The box entitled "Terminal" in above diagram is your terminal window, e.g. xterm, iTerm, or Terminal.app. In the old times, terminals where separate hardware devices, consisting of a keyboard and a screen, and they were connected to a (possibly remote) computer over a serial line (RS-232). Every character typed on the terminal keyboard was sent over this line to the computer and consumed by an application that was connected to the terminal. And every character that the application produced as output was sent over the same line to the terminal which displayed it on the screen.

Nowadays, terminals are not hardware devices anymore, but they moved "inside" the computer and became processes that are referred to as terminal emulators. xterm, iTerm2, Terminal.app, etc., are all terminal emulators.

However, the communication mechanism between applications and terminal emulators stayed the same as it was for hardware terminals. Terminal emulators emulate hardware terminals. That means, from the point of view of an application, talking to a terminal emulator today (e.g. iTerm2) works the same as talking to a real terminal (e.g. a DEC VT100) back in 1979. This mechanism was left unchanged so that applications developed for hardware terminals would still work with software terminal emulators.

So how does this communication mechanism work? UNIX has a subsystem called TTY in the kernel (TTY stands for teletype, which was the earliest form of computer terminals that didn't even have a screen, just a keyboard and a printer). You can think of TTY as a generic driver for terminals. TTY reads bytes from the port to which a terminal is connected (coming from the keyboard of the terminal), and writes bytes to this port (being sent to the display of the terminal).

There is a TTY instance for every terminal that is connected to a computer (or for every terminal emulator process running on the computer). Therefore, a TTY instance is also referred to as a TTY device (from the point of view of an application, talking to a TTY instance is like talking to a terminal device). In the UNIX manner of making driver interfaces available as files, these TTY devices are surfaced as /dev/tty* in some form, for example, on macOS they are /dev/ttys001, /dev/ttys002, etc.

An application can have its standard streams (stdin, stdout, stderr) directed to a TTY device (in fact, this is the default, and you can find out to which TTY device your shell is connected with the tty command). This means that whatever the user types on the keyboard becomes the standard input of the application, and whatever the application writes to its standard output is sent to the terminal screen (or terminal window of a terminal emulator). All this happens through the TTY device, that is, the application only communicates with the TTY device (this type of driver) in the kernel.

Now, the crucial point: the TTY device does more than just passing every input character to the standard input of the application. By default, the TTY device applies a so-called line discipline to the received characters. That means, it locally buffers them and interprets delete, backspace and other line editing characters, and only passes them to standard input of the application when it receives a carriage return or line feed, which means that the user has finished entering and editing a whole line.

That means until the user hits return, getchar() doesn't see anything in stdin. It's like nothing had been typed so far. Only when the user hits return, the TTY device sends these characters to the standard input of the application, where getchar() immediately reads them as.

In that sense, there is nothing special about the behaviour of getchar(). It just immediately reads characters in stdin as they become available. The line buffering that you observe happens in the TTY device in the kernel.

Now to the interesting part: this TTY device can be configures. You can do it, for example, from a shell with the stty command. This allows you to configure almost every aspect of the line discipline that the TTY device applies to incoming characters. Or you can disable any processing whatsoever by setting the TTY device to raw mode. In this case, the TTY device forwards every received character immediately to stdin of the application without any form of editing.

If you enable raw mode in the TTY device, you will see that getchar() immediately receives every character that you type on the keyboard. The following C program demonstrates this:

#include <stdio.h>
#include <unistd.h>   // STDIN_FILENO, isatty(), ttyname()
#include <stdlib.h>   // exit()
#include <termios.h>

int main() {
    struct termios tty_opts_backup, tty_opts_raw;

    if (!isatty(STDIN_FILENO)) {
      printf("Error: stdin is not a TTY\n");
      exit(1);
    }
    printf("stdin is %s\n", ttyname(STDIN_FILENO));

    // Back up current TTY settings
    tcgetattr(STDIN_FILENO, &tty_opts_backup);

    // Change TTY settings to raw mode
    cfmakeraw(&tty_opts_raw);
    tcsetattr(STDIN_FILENO, TCSANOW, &tty_opts_raw);

    // Read and print characters from stdin
    int c, i = 1;
    for (c = getchar(); c != 3; c = getchar()) {
        printf("%d. 0x%02x (0%02o)\r\n", i++, c, c);
    }
    printf("You typed 0x03 (003). Exiting.\r\n");

    // Restore previous TTY settings
    tcsetattr(STDIN_FILENO, TCSANOW, &tty_opts_backup);
}

The program sets the current process' TTY device to raw mode, then uses getchar() to read and print characters from stdin in a loop. The characters are printed as ASCII codes in hexadecimal and octal notation. The program specially interprets the ETX character (ASCII code 0x03) as a trigger to terminate. You can produce this character on your keyboard by typing Ctrl-C.

like image 188
weibeld Avatar answered Oct 13 '22 15:10

weibeld


getchar()'s input is line-buffered, and the input-buffer is limited, usually it's 4 kB. What you see at first is the echo of each character you're typing. When your press ENTER, then getchar() starts returning characters up to the LF (which is converted to CR-LF). When you keep on pressing keys without LF for some time, it stops echoing after 4096 characters, you have to press ENTER to continue.

like image 26
ott-- Avatar answered Oct 13 '22 17:10

ott--


I know that getchar() just ultimately calls fgetc(stdin).

Not necessarily. getchar and getc might as well expand to the actual procedure of reading from a file, with fgetc implemented as

int fgetc(FILE *fp)
{
    return getc(fp);
}

Hey, there's nothing in the buffer, so let stdin gather what the user types. [...] it seems this is more of a behavioral artifact of stdin rather than fgetc().

I can only tell you what I know, and that is how Unix/Linux works. On that platform, a FILE (including the thing that stdin points to) holds a file descriptor (an int) that is passed to the OS to indicate from which input source the FILE gets data, plus a buffer and some other bookkeeping stuff.

The "gather" part then means "call the read system call on the file descriptor to fill the buffer again". This varies per implementation of C, though.

like image 35
Fred Foo Avatar answered Oct 13 '22 16:10

Fred Foo