Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is "namespace cleanliness", and how does glibc achieve it?

I came across this paragraph from this answer by @zwol recently:

The __libc_ prefix on read is because there are actually three different names for read in the C library: read, __read, and __libc_read. This is a hack to achieve "namespace cleanliness", which you only need to worry about if you ever set out to implement a full-fledged and fully standards compliant C library. The short version is that there are many functions in the C library that need to call read, but some of them cannot use the name read to call it, because a C program is technically allowed to define a function named read itself.

As some of you may know, I am setting out to implement my own full-fledged and fully standards-compliant C library, so I'd like more details on this.

What is "namespace cleanliness", and how does glibc achieve it?

like image 965
S.S. Anne Avatar asked Aug 30 '19 20:08

S.S. Anne


2 Answers

First, note that the identifier read is not reserved by ISO C at all. A strictly conforming ISO C program can have an external variable or function called read. Yet, POSIX has a function called read. So how can we have a POSIX platform with read that at the same time allows the C program? After all fread and fgets probably use read; won't they break?

One way would be to split all the POSIX stuff into separate libraries: the user has to link -lio or whatever to get read and write and other functions (and then have fread and getc use some alternative read function, so they work even without -lio).

The approach in glibc is not to use symbols like read, but instead stay out of the way by using alternative names like __libc_read in a reserved namespace. The availability of read to POSIX programs is achieved by making read a weak alias for __libc_read. Programs which make an external reference to read, but do not define it, will reach the weak symbol read which aliases to __libc_read. Programs which define read will override the weak symbol, and their references to read will all go to that override.

The important part is that this has no effect on __libc_read. Moreover, the library itself, where it needs to use the read function, calls its internal __libc_read name that is unaffected by the program.

So all of this adds up to a kind of cleanliness. It's not a general form of namespace cleanliness feasible in a situation with many components, but it works in a two-party situation where our only requirement is to separate "the system library" and "the user application".

like image 188
Kaz Avatar answered Nov 10 '22 03:11

Kaz


OK, first some basics about the C language as specified by the standard. In order that you can write C applications without concern that some of the identifiers you use might clash with external identifiers used in the implementation of the standard library or with macros, declarations, etc. used internally in the standard headers, the language standard splits up possible identifiers into namespaces reserved for the implementation and namespaces reserved for the application. The relevant text is:

7.1.3 Reserved identifiers

Each header declares or defines all identifiers listed in its associated subclause, and optionally declares or defines identifiers listed in its associated future library directions subclause and identifiers which are always reserved either for any use or for use as file scope identifiers.

  • All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.
  • All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces.
  • Each macro name in any of the following subclauses (including the future library directions) is reserved for use as specified if any of its associated headers is included; unless explicitly stated otherwise (see 7.1.4).
  • All identifiers with external linkage in any of the following subclauses (including the future library directions) and errno are always reserved for use as identifiers with external linkage.184)
  • Each identifier with file scope listed in any of the following subclauses (including the future library directions) is reserved for use as a macro name and as an identifier with file scope in the same name space if any of its associated headers is included.

No other identifiers are reserved. If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved identifier as a macro name, the behavior is undefined.

Emphasis here is mine. As examples, the identifier read is reserved for the application in all contexts ("no other..."), but the identifier __read is reserved for the implementation in all contexts (bullet point 1).

Now, POSIX defines a lot of interfaces that are not part of the standard C language, and libc implementations might have a good deal more not covered by any standards. That's okay so far, assuming the tooling (linker) handles it correctly. If the application doesn't include <unistd.h> (outside the scope of the language standard), it can safely use the identifier read for any purpose it wants, and nothing breaks even though libc contains an identifier named read.

The problem is that a libc for a unix-like system is also going to want to use the function read to implement parts of the base C language's standard library, like fgetc (and all the other stdio functions built on top of it). This is a problem, because now you can have a strictly conforming C program such as:

#include <stdio.h>
#include <stdlib.h>
void read()
{
    abort();
}
int main()
{
    getchar();
    return 0;
}

and, if libc's stdio implementation is calling read as its backend, it will end up calling the application's function (not to mention, with the wrong signature, which could break/crash for other reasons), producing the wrong behavior for a simple, strictly conforming program.

The solution here is for libc to have an internal function named __read (or whatever other name in the reserved namespace you like) that can be called to implement stdio, and have the public read function call that (or, be a weak alias for it, which is a more efficient and more flexible mechanism to achieve the same thing with traditional unix linker semantics; note that there are some namespace issues more complex than read that can't be solved without weak aliases).

like image 38
R.. GitHub STOP HELPING ICE Avatar answered Nov 10 '22 05:11

R.. GitHub STOP HELPING ICE