So I was going through K&R second edition doing the exercises. Feeling pretty confident after doing few exercises I thought I'd check the actual implementations of these functions. It was then my confidence fled the scene. I could not understand any of it.
For example I check the getchar()
:
Here is the prototype in libio/stdio.h
extern int getchar (void);
So I follow it through it and gets this:
__STDIO_INLINE int
getchar (void)
{
return _IO_getc (stdin);
}
Again I follow it to the libio/getc.c
:
int
_IO_getc (fp)
FILE *fp;
{
int result;
CHECK_FILE (fp, EOF);
_IO_acquire_lock (fp);
result = _IO_getc_unlocked (fp);
_IO_release_lock (fp);
return result;
}
And I'm taken to another header file libio/libio.h
, which is pretty cryptic:
#define _IO_getc_unlocked(_fp) \
(_IO_BE ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end, 0) \
? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++)
Which is where I finally ended my journey.
My question is pretty broad. What does all this mean? I could not for the life of me figure out anything logical out of it by looking at the code. Looks like a bunch of codes abstracted away layers after layer.
More importantly when does it really get the character from stdin
Standard library functions are also known as built-in functions. Functions such as puts() , gets() , printf() , scanf() etc are standard library functions. These functions are already defined in header files (files with . h extensions are called header files such as stdio.
In C, functions must be first defined before they are used in the code. They can be either declared first and then implemented later on using a header file or in the beginning of the C file, or they can be implemented in the order they are used (less preferable).
Example: Square root using sqrt() function Suppose, you want to find the square root of a number. To compute the square root of a number, you can use the sqrt() library function. The function is defined in the math. h header file.
_IO_getc_unlocked
is an inlinable macro. The idea is that you can get a character from the stream without having to call a function, making it hopefully fast enough to use in tight loops, etc.
Let's take it apart one layer at a time. First, what is _IO_BE
?
/usr/include/libio.h:# define _IO_BE(expr, res) __builtin_expect ((expr), res)
_IO_BE is a hint to the compiler, that expr
will usually evaluate to res
. It's used to structure code flow to be faster when the expectation is true, but has no other semantic effect. So we can get rid of that, leaving us with:
#define _IO_getc_unlocked(_fp) \
( ( (_fp)->_IO_read_ptr >= (_fp)->_IO_read_end ) \
? __uflow(_fp) : *(unsigned char *)(_fp)->_IO_read_ptr++) )
Let's turn this into an inline function for clarity:
inline int _IO_getc_unlocked(FILE *fp) {
if (_fp->_IO_read_ptr >= _fp->_IO_read_end)
return __uflow(_fp);
else
return *(unsigned char *)(_fp->_IO_read_ptr++);
}
In short, we have a pointer into a buffer, and a pointer to the end of the buffer. We check if the pointer is outside the buffer; if not, we increment it and return whatever character was at the old value. Otherwise we call __uflow
to refill the buffer and return the newly read character.
As such, this allows us to avoid the overhead of a function call until we actually need to do IO to refill the input buffer.
Keep in mind that standard library functions can be complicated like this; they can also use extensions to the C language (such as __builtin_expect
) that are NOT standard and may NOT work on all compilers. They do this because they need to be fast, and because they can make assumptions about what compiler they're using. Generally speaking your own code should not use such extensions unless absolutely necessary, as it'll make porting to other platforms more difficult.
Going from pseudo-code to real code we can break it down:
if (there is a character in the buffer)
return (that character)
else
call a function to refill the buffer and return the first character
end
Let's use the ?: operator:
#define getc(f) (is_there_buffered_stuff(f) ? *pointer++ : refill())
A bit closer:
#define getc(f) (is_there_buffered_stuff(f) ? *f->pointer++ : refill(f))
Now we are almost there. To determine if there is something buffered already, it uses the file structure pointer and a read pointer within the buffer
_fp->_IO_read_ptr >= _fp->_IO_read_end ?
This actually tests the opposite condition to my pseudo-code, "is the buffer empty", and if so, it calls __uflow(_fp) // "underflow"
, otherwise, it just reaches directly into the buffer with a pointer, gets the character, and then increments the pointer:
? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With