A related question is here, but my question is different. But, I'd like to know more about the internals of getchar() and stdin. I know that getchar() just ultimately calls fgetc(stdin). My question is about buffering, stdin and getchar() behavior. Given the classic K&R example: <pre class="prettyprint"><code>#include <stdio.h> main() { int c; c = getchar(); while (c != EOF) { putchar(c); c = getchar(); } } </code></pre> It seems to me that getchar()'s behavior could be described as follows: If there's nothing in the stdin buffer, let the OS accept user input until [enter] is pressed. Then return the first character in the buffer. Assume the program is run and the user types "anchovies." So, in the above code listing, the first call to getchar() awaits user input and assigns the first character in the buffer to variable c. Inside the loop, the first iteration's call to getchar() says "Hey, there's stuff in the buffer, return the next character in the buffer." But the Nth iteration of the while loop results in getchar() saying "Hey, there's nothing in the buffer, so let stdin gather what the user types. I've spend a little time with the c source, but it seems this is more of a behavioral artifact of stdin rather than fgetc(). Am I wrong here? Thanks for your insight.

<blockquote> I know that <code>getchar()</code> just ultimately calls <code>fgetc(stdin)</code>. </blockquote> Not necessarily. <code>getchar</code> and <code>getc</code> might as well expand to the actual procedure of reading from a file, with <code>fgetc</code> implemented as <pre class="prettyprint"><code>int fgetc(FILE *fp) { return getc(fp); } </code></pre> <blockquote> Hey, there's nothing in the buffer, so let stdin gather what the user types. [...] it seems this is more of a behavioral artifact of <code>stdin</code> rather than <code>fgetc()</code>. </blockquote> I can only tell you what I know, and that is how Unix/Linux works. On that platform, a <code>FILE</code> (including the thing that <code>stdin</code> points to) holds a file descriptor (an <code>int</code>) that is passed to the OS to indicate from which input source the <code>FILE</code> gets data, plus a buffer and some other bookkeeping stuff. The "gather" part then means "call the <code>read</code> system call on the file descriptor to fill the buffer again". This varies per implementation of C, though.

getchar() and stdin

Tags:

c

A related question is here, but my question is different.

But, I'd like to know more about the internals of getchar() and stdin. I know that getchar() just ultimately calls fgetc(stdin).

My question is about buffering, stdin and getchar() behavior. Given the classic K&R example:

#include <stdio.h>

main()
{
    int c;

    c = getchar();
    while (c != EOF) {
        putchar(c);
        c = getchar();
    }
}

It seems to me that getchar()'s behavior could be described as follows:

If there's nothing in the stdin buffer, let the OS accept user input until [enter] is pressed. Then return the first character in the buffer.

Assume the program is run and the user types "anchovies."

So, in the above code listing, the first call to getchar() awaits user input and assigns the first character in the buffer to variable c. Inside the loop, the first iteration's call to getchar() says "Hey, there's stuff in the buffer, return the next character in the buffer." But the Nth iteration of the while loop results in getchar() saying "Hey, there's nothing in the buffer, so let stdin gather what the user types.

I've spend a little time with the c source, but it seems this is more of a behavioral artifact of stdin rather than fgetc().

Am I wrong here? Thanks for your insight.

287

asked Oct 12 '11 14:10

ybakos

3 Answers

The behaviour you're observing has nothing to do with C and getchar(), but with the teletype (TTY) subsystem in the OS kernel.

For this you need to know how processes get their input from your keyboard and how they write their output to your terminal window (I assume you use UNIX and the following explanations apply specifically to UNIX, i.e. Linux, macOS, etc.):

enter image description here

The box entitled "Terminal" in above diagram is your terminal window, e.g. xterm, iTerm, or Terminal.app. In the old times, terminals where separate hardware devices, consisting of a keyboard and a screen, and they were connected to a (possibly remote) computer over a serial line (RS-232). Every character typed on the terminal keyboard was sent over this line to the computer and consumed by an application that was connected to the terminal. And every character that the application produced as output was sent over the same line to the terminal which displayed it on the screen.

Nowadays, terminals are not hardware devices anymore, but they moved "inside" the computer and became processes that are referred to as terminal emulators. xterm, iTerm2, Terminal.app, etc., are all terminal emulators.

However, the communication mechanism between applications and terminal emulators stayed the same as it was for hardware terminals. Terminal emulators emulate hardware terminals. That means, from the point of view of an application, talking to a terminal emulator today (e.g. iTerm2) works the same as talking to a real terminal (e.g. a DEC VT100) back in 1979. This mechanism was left unchanged so that applications developed for hardware terminals would still work with software terminal emulators.

So how does this communication mechanism work? UNIX has a subsystem called TTY in the kernel (TTY stands for teletype, which was the earliest form of computer terminals that didn't even have a screen, just a keyboard and a printer). You can think of TTY as a generic driver for terminals. TTY reads bytes from the port to which a terminal is connected (coming from the keyboard of the terminal), and writes bytes to this port (being sent to the display of the terminal).

There is a TTY instance for every terminal that is connected to a computer (or for every terminal emulator process running on the computer). Therefore, a TTY instance is also referred to as a TTY device (from the point of view of an application, talking to a TTY instance is like talking to a terminal device). In the UNIX manner of making driver interfaces available as files, these TTY devices are surfaced as /dev/tty* in some form, for example, on macOS they are /dev/ttys001, /dev/ttys002, etc.

An application can have its standard streams (stdin, stdout, stderr) directed to a TTY device (in fact, this is the default, and you can find out to which TTY device your shell is connected with the tty command). This means that whatever the user types on the keyboard becomes the standard input of the application, and whatever the application writes to its standard output is sent to the terminal screen (or terminal window of a terminal emulator). All this happens through the TTY device, that is, the application only communicates with the TTY device (this type of driver) in the kernel.

Now, the crucial point: the TTY device does more than just passing every input character to the standard input of the application. By default, the TTY device applies a so-called line discipline to the received characters. That means, it locally buffers them and interprets delete, backspace and other line editing characters, and only passes them to standard input of the application when it receives a carriage return or line feed, which means that the user has finished entering and editing a whole line.

That means until the user hits return, getchar() doesn't see anything in stdin. It's like nothing had been typed so far. Only when the user hits return, the TTY device sends these characters to the standard input of the application, where getchar() immediately reads them as.

In that sense, there is nothing special about the behaviour of getchar(). It just immediately reads characters in stdin as they become available. The line buffering that you observe happens in the TTY device in the kernel.

Now to the interesting part: this TTY device can be configures. You can do it, for example, from a shell with the stty command. This allows you to configure almost every aspect of the line discipline that the TTY device applies to incoming characters. Or you can disable any processing whatsoever by setting the TTY device to raw mode. In this case, the TTY device forwards every received character immediately to stdin of the application without any form of editing.

If you enable raw mode in the TTY device, you will see that getchar() immediately receives every character that you type on the keyboard. The following C program demonstrates this:

#include <stdio.h>
#include <unistd.h>   // STDIN_FILENO, isatty(), ttyname()
#include <stdlib.h>   // exit()
#include <termios.h>

int main() {
    struct termios tty_opts_backup, tty_opts_raw;

    if (!isatty(STDIN_FILENO)) {
      printf("Error: stdin is not a TTY\n");
      exit(1);
    }
    printf("stdin is %s\n", ttyname(STDIN_FILENO));

    // Back up current TTY settings
    tcgetattr(STDIN_FILENO, &tty_opts_backup);

    // Change TTY settings to raw mode
    cfmakeraw(&tty_opts_raw);
    tcsetattr(STDIN_FILENO, TCSANOW, &tty_opts_raw);

    // Read and print characters from stdin
    int c, i = 1;
    for (c = getchar(); c != 3; c = getchar()) {
        printf("%d. 0x%02x (0%02o)\r\n", i++, c, c);
    }
    printf("You typed 0x03 (003). Exiting.\r\n");

    // Restore previous TTY settings
    tcsetattr(STDIN_FILENO, TCSANOW, &tty_opts_backup);
}

The program sets the current process' TTY device to raw mode, then uses getchar() to read and print characters from stdin in a loop. The characters are printed as ASCII codes in hexadecimal and octal notation. The program specially interprets the ETX character (ASCII code 0x03) as a trigger to terminate. You can produce this character on your keyboard by typing Ctrl-C.

188

answered Oct 13 '22 15:10

weibeld

getchar()'s input is line-buffered, and the input-buffer is limited, usually it's 4 kB. What you see at first is the echo of each character you're typing. When your press ENTER, then getchar() starts returning characters up to the LF (which is converted to CR-LF). When you keep on pressing keys without LF for some time, it stops echoing after 4096 characters, you have to press ENTER to continue.

answered Oct 13 '22 17:10

ott--

I know that getchar() just ultimately calls fgetc(stdin).

Not necessarily. getchar and getc might as well expand to the actual procedure of reading from a file, with fgetc implemented as

int fgetc(FILE *fp)
{
    return getc(fp);
}

Hey, there's nothing in the buffer, so let stdin gather what the user types. [...] it seems this is more of a behavioral artifact of stdin rather than fgetc().

I can only tell you what I know, and that is how Unix/Linux works. On that platform, a FILE (including the thing that stdin points to) holds a file descriptor (an int) that is passed to the OS to indicate from which input source the FILE gets data, plus a buffer and some other bookkeeping stuff.

The "gather" part then means "call the read system call on the file descriptor to fill the buffer again". This varies per implementation of C, though.

answered Oct 13 '22 16:10

Fred Foo

Related questions
                            
                                Why does right shifting negative numbers in C bring 1 on the left-most bits? [duplicate]
                            
                                Parsing a string in C with strsep (alternative methods)
                            
                                2D morton code encode/decode 64bits
                            
                                Inconsistent gcc diagnostic for string initialization
                            
                                How is the "getchar()" function able to take multiple characters as input?
                            
                                What will be the output, if we print a string that contains "%s" in it?
                            
                                Index of lowest order bit
                            
                                Simple C array declaration / assignment question
                            
                                Dynamic memory allocation on stack
                            
                                How many digits in this base?
                            
                                What is the cleanest way to create a timeout for a while loop?
                            
                                Does cscope has search history or search query stack feature?
                            
                                strcat implementation
                            
                                Picking a random item based on probabilities
                            
                                #line - purposes of?
                            
                                char four[4] = "four"; What are the correct semantics for this statement?
                            
                                Pointer to #define
                            
                                How to 'randomize()' random numbers in C(Linux)?
                            
                                Sleeping for an exact duration
                            
                                Is there any trick to forbid C macro to be called as a lvalue?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With