Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to detect conflicting use of reserved identifiers in C?

Tags:

c

gcc

According to the C standard, if a program defines or declares a reserved identifier, the behavior is undefined. One category of reserved identifiers is identifiers with external linkage defined in the C standard library.

For example of a program with undefined behavior, consider the following: file1.c defines a variable named time with external linkage, which conflicts with the time function from the standard library, declared in time.h.

file1.c:

int time;

int foo( void )
{
    return time;
}

file2.c:

#include <time.h>
#include <stdio.h>

extern int foo( void );

int main( void )
{
    foo();
    printf( "current time = %ld\n", time( NULL ) );
    return 0;
}

When the program is compiled and run, a seg fault occurs, because the time symbol referenced in file2.c gets linked to the time variable from file1.c, rather than the function in the C library.

$ gcc -c -o file1.o file1.c
$ gcc -c -o file2.o file2.c
$ gcc -o test file1.o file2.o 
$ ./test
Segmentation fault (core dumped)

I'm wondering if there is any way for GCC to detect the usage of conflicting, reserved identifiers in user code, at compile or link time. Here's my motivation: I'm working on an application where users can write C extensions to the application, which get compiled and linked to the rest of the application. If the user's C code uses reserved identifiers like the example above, the resulting program can fail in hard-to-predict ways.

One solution which comes to mind is to run something like nm on the user's object files, and compare the defined symbols against a list of reserved identifiers from the C library. However, I am hoping to find something in GCC which can detect the issue. Does anyone know if that is possible, or have any suggestions?

like image 669
rkuczwara Avatar asked Oct 26 '18 15:10

rkuczwara


People also ask

What is reserved identifier in C?

Identifiers with two initial underscores or an initial underscore followed by an uppercase letter are reserved globally for use by the compiler. Identifiers that begin with a single underscore are reserved as identifiers with file scope in both the ordinary and tag namespaces.

Can an identifier end with underscore?

Characters in identifiers The first character in an identifier must be a letter or the _ (underscore) character; however, beginning identifiers with an underscore is considered poor programming style. The compiler distinguishes between uppercase and lowercase letters in identifiers.

Can an identifier start with _?

Rules for writing identifier An identifier can be composed of letters (both uppercase and lowercase letters), digits and underscore '_' only. The first letter of identifier should be either a letter or an underscore. But, it is discouraged to start an identifier name with an underscore though it is legal.

Which is not allowed as part of an identifier?

No special characters, such as a semicolon, period, whitespaces, slash, or comma are permitted to be used in or as an Identifier.


2 Answers

I'm wondering if there is any way for GCC to detect the usage of conflicting, reserved identifiers in user code, at compile or link time.

Detail to @PSkocik good answer.
One way to detect many conflicts is to include all headers files. Compilation times may noticeable increase.

Determine version

#if defined(__STDC__)
# define STANDARD_C89
# if defined(__STDC_VERSION__)
#  define STANDARD_C90
#  if (__STDC_VERSION__ >= 199409L)
#   define STANDARD_C95
#  endif
#  if (__STDC_VERSION__ >= 199901L)
#   define STANDARD_C99
#  endif
#  if (__STDC_VERSION__ >= 201112L)
#   define STANDARD_C11
#  endif
#  if (__STDC_VERSION__ >= 201710L)
#   define STANDARD_C18
#  endif
# endif
#endif

Include them, some selectively.

#include <assert.h>
//#include <complex.h>
#include <ctype.h>
#include <errno.h>
//#include <fenv.h>
#include <float.h>
//#include <inttypes.h>
//#include <iso646.h>
#include <limits.h>
#include <locale.h>
#include <math.h>
#include <setjmp.h>
#include <signal.h>
#include <stdarg.h>
//#include <stdalign.h>
//#include <stdatomic.h>
//#include <stdbool.h>
#include <stddef.h>
//#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
//#include <stdnoreturn.h>
#include <string.h>
//#include <tgmath.h>
//#include <threads.h>
#include <time.h>
//#include <uchar.h>
//#include <wchar.h>
//#include <wctype.h>

//////////////////////////////
#ifdef STANDARD_C95
#include <iso646.h>
#include <wchar.h>
#include <wctype.h>
#endif

//////////////////////////////
#ifdef STANDARD_C99
#ifndef __STDC_NO_COMPLEX__
#include <complex.h>
#endif
#include <fenv.h>
#include <inttypes.h>
#include <stdbool.h>
#include <stdint.h>
#include <tgmath.h>
#endif

//////////////////////////////
#ifdef STANDARD_C11
#include <stdalign.h>
#ifndef __STDC_NO_THREADS__
#include <stdatomic.h>
#include <threads.h>
#endif
#include <stdnoreturn.h>
#include <uchar.h>
#endif

I am certain the above needs some refinements and would appreciate advice on that.


To avoid additions to the name space, instead of code like #define STANDARD_C11, use macro code tests

// #ifdef STANDARD_C11
//  ... C11 includes
// #endif

#if defined(__STDC__)
# if defined(__STDC_VERSION__)
#  if (__STDC_VERSION__ >= 201112L)
     ... C11 includes
#  endif
# endif
#endif

Although the goal is "According to the C standard ...", additional code may be needed to accommodate popular compiler extensions and slight variations from the standard.

like image 41
chux - Reinstate Monica Avatar answered Nov 15 '22 15:11

chux - Reinstate Monica


You could grab a libc implementation that you can link statically and with -Wl,--whole-archive and try and slap it onto your object files.

main.c:

int time=42;
int main(){}

link it with a whole libc:

$ musl-gcc main.c -static -Wl,--whole-archive

If you get a multiple definition error or a type/size/alignment of symbol changed warning, you're clashing with your libc.

/usr/local/bin/ld: /usr/local/musl/lib/libc.a(time.lo): in function `time':
/home/petr/f/proj/bxdeps/musl/src/time/time.c:5: multiple definition of `time'; /tmp/cc3bL3pP.o:(.data+0x0): first defined here

Alternatively (and more robustly) you could preinclude and all-of-C (all-of-posix) header and have the compiler tell you about where you're clashing with it (I'd do it just once in a while, otherwise it's going to somewhat pessimize your build times. (Although even including all of POSIX generally isn't as bad as including even a single C++ header)).

like image 146
PSkocik Avatar answered Nov 15 '22 16:11

PSkocik