I'm working on a project that's heavily multi-threaded, and was wondering if there's a way to have the compiler flag the use of non-reentrant calls to the C library (e.g. strtok intsead of strtok_r)? If not, is there a list of calls that are non-reentrant so I can grep through my code base periodically?
A related question is if there's a way to flag 3d party library use of non-reentrant calls.
I'm assuming reentrancy implies thread-safety, but not necessarily the other way around. Is there a good reason to use non-reentrant calls in a threaded project?
On most systems, malloc and free are not reentrant, because they use a static data structure which records what memory blocks are free. As a result, no library functions that allocate or free memory are reentrant. This includes functions that allocate space to store a result.
Non-reentrant functions are functions that cannot safely be called, interrupted, and then recalled before the first call has finished without resulting in memory corruption.
Therefore, printf is also non-reentrant. As of C11, malloc and printf are both required to be thread-safe (and POSIX has required this since 2001). However, it's still not safe to use them from an async signal handler.
Why is it unsafe to call printf and malloc inside a signal handler? Application code can execute in the middle of a call to either printf or malloc as a result of an asynchronous signal. Neither function is required to be async-signal-safe, so it is unsafe to call them from the handler for an asynchronous signal.
is there a list of calls that are non-reentrant so I can grep through my code base periodically?
I looked through the GNU libc function list, and picked out the ones with _r. Here's the list.
asctime, crypt, ctime, drand48, ecvt, encrypt, erand48, fcvt, fgetgrent, fgetpwent, getdate, getgrent, getgrgid, getgrnam, gethostbyaddr, gethostbyname2, gethostbyname, getmntent, getnetgrent, getpwent, getpwnam, getpwuid, getutent, getutid, getutline, gmtime, hcreate, hdestroy, hsearch, initstate, jrand48, lcong48, lgamma, lgammaf, lgammal, localtime, lrand48, mrand48, nrand48, ptsname, qecvt, qfcvt, rand, random, readdir64, readdir, seed48, setkey, setstate, srand48, srandom, strerror, strtok, tmpnam, ttyname
For source, you could possibly insist that every source file contains the line:
#include <beware.h>
after the C headers, and then the beware.h
header file contains:
#define strtok unsafe_function_call_detected_strtok
#define getenv unsafe_function_call_detected_getenv
or some other suitable set of names that are unlikely to be real functions. That will result in compilation and/or linker errors.
For libraries, it's a bit more difficult. You can look into using nm
to extract all the unresolved names in each object file and ensure none of the unsafe ones are called.
This wouldn't be the compiler doing it but it would be easy enough to incorporate into the build scripts. See the following transcript:
$ cat qq.c
#include <stdio.h>
int main (int argc, char *argv[]) {
printf ("Hello, world.\n");
return 0;
}
$ gcc -c -o qq.o qq.c
$ nm qq.o
00000000 b .bss
00000000 d .data
00000000 r .rdata
00000000 t .text
U ___main
00000000 T _main
U _puts
You can see the unresolved symbols in that output with a U
marker (and gcc
has very sneakily decided to use puts
instead of printf
since I gave it a constant string with no formatting commands).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With