Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is standard library for programming language implemented?

I have problem understanding how are standard libraries for programming languages, other than C, written.

As far as i understand, C standard libraries can be implemented in mixture of C and assembler, where assembler is needed so system calls can be invoked and thus fopen, fscanf ... can be used.

How do the other programming languages accomplish this functionality(working with i/o, files, all other stuff for which system calls are needed) with their standard libary? Do they all allow inlining of assembler like C or is there some other way?

I have read that C and its standard library can be used, for implementing other languages libraries, but i am not sure how this is done.

edit1. Trying to be more specific.
(Language for which standard library is implemented is referred to as new_lang.)

If someone can elaborate how second approach is done(using C runtime) at the object code level and implementation level, because somethings i cant get my head around are:

  1. Is C runtime invoked using C syntax or new_lang syntax? How do we call ssize_t write(int fd, const void *buf, size_t count) from somewhere within new_lang library?
  2. What happens if new_lang doesn't have pointers as data types, how is second argument, const void *buf to write passed from new_lang? How does new_lang follow C runtime api if it doesn't have C data types?
  3. If some function from new_lang library calls C runtime, does it mean that it must obey its abi? Data sizes for types of integer, char, must match in new_lang and C for given platform(and other stuff which is specified by abi, are arguments passed by stack or registers etc.)?
    Isn't this little overrestricting, for example what if new_lang needs more bytes to be reserved for char?

I tried to be as general as possible, but i am not sure how to explain the problem without going into a little detail.

like image 851
Rorsch Avatar asked Apr 02 '16 20:04

Rorsch


1 Answers

It depends on the language, and it can be even multiple choice. Note that standard libraries/runtimes implemented in C often use compiler specific extensions and attributes and therefore are not written in standard unextended C.

For a language as Pascal, multiple approaches are possible and do exist. Pascal is a language on the same level as C (and/or C++ since most surviving ones are object oriented too), and e.g. FreePascal has its runtime library in Pascal and assembler, and can run on Linux without linking to any C compiled code.

The reasons to go for C are usually more management (availability of tools and programmers) than technical

While at the same time Gnu Pascal is basically a gcc mod, and builds on libgcc, glibc etc.

Answer to edit1:

  1. Afaik that is very internal to the exact target that you are using. There is something write() callable from the system compiler, but that might be a runtime (3) function that wraps the syscall, not (2) syscall directly. Afaik it is guaranteed that (3) functions are really functions and not macros, but I'm not entirely sure about that.

On BSD the syscalls are fairly equal to a function call, on Linux/i386 not. Syntax doesn't matter, the generated code must be equivalent (not the same, but close). The syntax itself doesn't matter, it is how the C compiler interprets the syntax. And usually the only thing guaranteed to work (as far as classic POSIX philosophy goes) is the system C compiler, which is the only one that is guaranteed to be able to interpret system headers, since they often contain non standard extensions or modifiers. Anything else will have to make sure it matches, possibly on a per target basis. Most languages therefore build on top of the C runtime and usually have a C part of their own runtime.

  1. You must make somehow make them match to the C compiler for each target on a per target basis, either by adapting automatically (your whole system is based on C and the C compilers and type equivalency propagates somewhat automatically), by painful target-by-target crafting some equivalence, or wrapping each function in C or assembler code. And sometimes multiple times per target (e.g. MS VC and mingw, though recently these are more compatible then say 10-15 years ago, when gcc wasn't e.g. COM compatible)

E.g. Free Pascal has a cdecl; modifier to mark C callable functions, and the compiler then generates calling code equivalent to the system C compiler on that target.

This sounds bad, but there are usually a few variants only. But that still doesn't make it easy, e.g. the x86_64 API differs slightly between Linux/FreeBSD one one side(sysv), Windows (win64 own convention) and OS X (aix convention). One can avoid it by implementing your whole system as much as possible in C, but then you are stuck with it (and an hybrid language system) forever. Moreover this way Cisms and Unixisms creep into your new language, because it easier.

Many languages on *nix go this way, because it is easier to make quick initial ports to something new. But in turn you get to maintain a hybrid language system. Also usually inherits many build related C traits like external preprocessors, header-are-included as text and reinterpreted over and over again, and a make based build system.

For a list of possible issues see How to design a C / C++ library to be usable in many client languages?

  1. Yes, but only the binary part of it, since of course the C compiler can't do strict forms of typechecking. But sizes, field offsets (packing), calling order, register use, and things like if small structs are passed in registers or not must match.
like image 131
Marco van de Voort Avatar answered Oct 04 '22 04:10

Marco van de Voort