Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inner working of the C standard library

Tags:

c

standards

I am interested in the inner working of the standard C library. I found a good book about a possible implementation - but I am looking for a deeper explanation of the whole standard library and the standards (like POSIX) - the definition of these standards in the standard library.

The C drafts are very helpful but not very nice to read. Is there other literature about this topic?

  • Standard-Library-P-J-Plauger 1991
  • FreeBSD
  • GNU man
  • C draft(s)

Albertus

like image 540
swaechter Avatar asked Apr 02 '12 16:04

swaechter


People also ask

How does a library work in C?

C libraries store files in object code; during the linking phase of the compilation process ( Compilation Process) files in object code are accessed and used. It is faster to link a function from a C library than to link object files from a separate memory sticks or discs.

What is in the standard C library?

The C standard library provides macros, type definitions and functions for tasks such as string handling, mathematical computations, input/output processing, memory management, and several other operating system services.

How is C standard library written?

The standard libraries are typically written in C and C++, using a bare minimum of assembly code in order to interact with the functionality provided by the operating system, and most operating systems are written in C as well as a mix of assembly for a handful of things that cannot be done directly in C.

How does the C++ standard library work?

The C++ Standard Library provides several generic containers, functions to use and manipulate these containers, function objects, generic strings and streams (including interactive and file I/O), support for some language features, and functions for common tasks such as finding the square root of a number.


2 Answers

A good starting point would be POSIX. The POSIX 2008 specification is available online here:

http://pubs.opengroup.org/onlinepubs/9699919799/

It's more accessible (but sometimes less rigorous) than the C standard, and covers a lot more than just the C standard, i.e. most of the standardized parts of Unix-like systems' standard libraries.

If you're interested in implementations, the first thing to be aware of is that the POSIX-described behavior is usually split (by necessity and pragmatic reasons) between the kernel implementation and the userspace libc implementation. A large number of the functions in POSIX (and a few from the C standard) will merely be wrappers for "system calls", i.e. transitions into kernelspace to service the request. On some libc implementations, even finding these wrappers will be difficult, since they're often either automatically generated by the build scripts, and/or unified into a single assembly-language file.

The major (significant amount of non-kernel code) subsystems of the standard library are generally:

  • stdio: On glibc, this is implemented by the GNU libio library, which is a unified implementation of C stdio and C++ iostream, optimized so that neither has to be slowed down by being a wrapper for the other. It's a big hack, and the code is difficult to find and follow. Other implementations (especially the BSDs, but also other libcs on Linux) are much simpler and clearer to read. Ultimately they're based on the underlying file-descriptor IO functions like open, read, etc.
  • POSIX threads: On glibc and modern uClibc, this is NPTL. I'm not familiar with the BSDs' thread implementations. Other Linux libcs either lack threads or provide their own implementations based mainly on Linux clone and futex syscalls.
  • Math library: ultimately, almost all of these are based on the old Sun math code from the early 90s, but they've diverged a lot. Fdlibm is a pretty good base approximation of the code used in modern libcs.
  • User, group, hostname (DNS), etc. lookups: This is handled through libnss in glibc, and directly in most other libcs.
  • Regular expression and glob matching
  • Time and timezone handling
  • Locale and charset conversion
  • Malloc

If you want to get started reading sources, I would recommend not starting with glibc. It's very large and unwieldy. If you do want to read glibc, be aware that lots of the code is hiding under the sysdeps trees and is organized based on the diversity of systems it's applicable to.

Dietlibc is quite readable, but if you read its source, be aware that it's full of common C programming mistakes (e.g. using int where size_t is needed, not checking for overflows, etc.). If you keep this in mind, it might not be a bad choice, since ignoring lots of possible errors/failures tends to make the code very simple.

With that said, for reading libc source, I would most recommend either one of the BSDs or musl (disclaimer: I am the primary author of musl so I am a bit biased here). BSDs also have the advantage that the kernelspace code is also extremely simple and readable, so if you want to read the kernel code on the other side of a system call, you can do that too.

like image 199
R.. GitHub STOP HELPING ICE Avatar answered Sep 22 '22 14:09

R.. GitHub STOP HELPING ICE


In "C: A Reference Manual, Fifth Edition" by Harbison & Steele, the second part of the book is dedicated to the C Standard library (Part 2: chapters 10-24).

http://careferencemanual.com

The Rationale document for C99 didn't cover the C library but the ANSI C89 Rationale covers in its chapter 4. There is a copy of the document here:

http://www.lysator.liu.se/c/rat/title.html

like image 21
ouah Avatar answered Sep 21 '22 14:09

ouah