Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does global variable definition in C header file work? [duplicate]

Tags:

c

From what I saw across many many stackoverflow questions among other places, the way to define globals is to define them in exactly one .c file, then declare it as an extern in a header file which then gets included in the required .c files.

However, today I saw in a codebase global variable definition in the header file and I got into arguing, but he insisted it will work. Now, I had no idea why, so I created a small project to test it out real quick:

a.c

#include <stdio.h>
#include "a.h"

int main()
{
    p1.x = 5;
    p1.x = 4;
    com = 6;
    change();
    printf("p1 = %d, %d\ncom = %d\n", p1.x, p1.y, com);
    return 0;
}

b.c

#include "a.h"

void change(void)
{
    p1.x = 7;
    p1.y = 9;
    com = 1;
}

a.h

typedef struct coord{
    int x;
    int y;
} coord;

coord p1;
int com;

void change(void);

Makefile

all:
    gcc -c a.c -o a.o
    gcc -c b.c -o b.o
    gcc a.o b.o -o run.out

clean:
    rm a.o b.o run.out

Output

p1 = 7, 9
com = 1

How is this working? Is this an artifact of the way I've set up the test? Is it that newer gcc has managed to catch this condition? Or is my interpretation of the whole thing completely wrong? Please help...

like image 513
Adeesh Lemonickous Avatar asked Feb 04 '21 11:02

Adeesh Lemonickous


People also ask

Can you define global variables in header files?

The clean, reliable way to declare and define global variables is to use a header file to contain an extern declaration of the variable. The header is included by the one source file that defines the variable and by all the source files that reference the variable.

Can we define variable in header file in C?

ANSWER. Yes. Although this is not necessarily recommended, it can be easily accomplished with the correct set of macros and a header file. Typically, you should declare variables in C files and create extern definitions for them in header files.

What happens if we include a static global variable in a header file?

Basically, each source file together with all included header files is a single translation unit. So If you have a static variable in a header file then it will be unique in each source file (translation unit) the header file is included in.

How are global variables declared in C?

The C language allows the redeclaration of the global variable. It means that this variable can get declared again when the first declaration doesn't lead to the initialization of the variable. It is possible because the second program works pretty well in the C language even if the first one fails during compilation.


2 Answers

This relies on so called "common symbols" which are an extension to standard C's notion of tentative definitions (https://port70.net/~nsz/c/c11/n1570.html#6.9.2p2), except most UNIX linkers make it work across translation units too (and many even with shared dynamic libaries)

AFAIK, the feature has existed since pretty much forever and it had something to do with fortran compatibility/similarity.

It works by the compiler placing giving uninitialized (tentative) globals a special "common" category (shown in the nm utility as "C", which stands for "common").

Example of data symbol categories:

  #!/bin/sh -eu
(
cat <<EOF
int common_symbol; //C
int zero_init_symbol = 0; //B
int data_init_symbol = 4; //D
const int const_symbol = 4; //R
EOF
) | gcc -xc - -c -o data_symbol_types.o
nm data_symbol_types.o

Output:

0000000000000004 C common_symbol
0000000000000000 R const_symbol
0000000000000000 D data_init_symbol
0000000000000000 B zero_init_symbol

Whenever a linker sees multiple redefinitions for a particular symbol, it usually generates linkers errors.

But when those redefinitions are in the common category, the linker will merge them into one. Also, if there are N-1 common definitions for a particular symbol and one non-tentative definition (in the R,D, or B category), then all the definitions are merged into the one nontentative definition and also no error is generated.

In other cases you get symbol redefinition errors.

Although common symbols are widely supported, they aren't technically standard C and relying on them is theoretically undefined behavior (even though in practice it often works).

clang and tinycc, as far as I've noticed, do not generate common symbols (there you should get a redefinition error). On gcc, common symbol generation can be disabled with -fno-common.

(Ian Lance Taylor's serios on linkers has more info on common symbols and it also mentions how linkers even allow merging differently sized common symbols, using the largest size for the final object: https://www.airs.com/blog/archives/42 . I believe this weird trick was once used by libc's to some effect)

like image 130
PSkocik Avatar answered Oct 23 '22 22:10

PSkocik


That program should not compile (well it should compile, but you'll have double definition errors in your linking phase) due to how the variables are defined in your header file.

A header file informs the compiler about external environment it normally cannog guess by itself, as external variables defined in other modules.

As your question deals with this, I'll try to explain the correct way to define a global variable in one module, and how to inform the compiler about it in other modules.

Let's say you have a module A.c with some variable defined in it:

A.c

int I_am_a_global_variable;  /* you can even initialize it */

well, normally to make the compiler know when compiling other modules that you have that variable defined elsewhere, you need to say something like (the trick is in the extern keyword used to say that it is not defined here):

B.c

extern int I_am_a_global_variable; /* you cannot initialize it, as it is defined elsewhere */

As this is a property of the module A.c, we can write a A.h file, stating that somewhere else in the program, there's a variable named I_am_a_global_variable of type int, in order to be able to access it.

A.h

extern int I_am_a_global_variable; /* as above, you cannot initialize the variable here */

and, instead of declaring it in B.c, we can include the file A.h in B.c to ensure that the variable is declared as the author of B.c wanted to.

So now B.c is:

B.c

#include "A.h"
void some_function() {
    /* ... */
    I_am_a_global_variable = /* some complicated expression */;
}

this ensures that if the author of B.c decides to change the type or the declaration of the variable, he can do changing the file A.h and all the files that #include it should be recompiled (you can do this in the Makefile for example)

A.c

#include "A.h"   /* extern int I_am_a_global_variable; */
int I_am_a_global_variable = 27; 

In order to prevent errors, it is good that A.c also #includes the file A.h, so the declaration

extern int I_am_a_global_variable; /* as above, you cannot initialize the variable here */

and the final definition (that is included in A.c):

int I_am_a_global_variable = 23; /* I have initialized it to a non-default value to show how to do it */

are consistent between them (consider the author changes the type of I_am_a_global_variable to double and forgets to change the declaration in A.h, the compiler will complaint about non-matching declaration and definition, when compiling A.c (which now includes A.h).

Why I say that you will have double definition errors when linking?

Well, if you compile several modules with the statement (result of #includeing the file A.h in several modules) with the statement:

#include "A.h" /* this has an extern int I_am_a_global_variable; that informs the
                * compiler that the variable is defined elsewhere, but see below */
int I_am_a_global_variable; /* here is _elsewhere_ :) */

then all those modules will have a global variable I_m_a_global_variable, initialized to 0, because the compiler defined it in every module (you don't say that the variable is defined elsewhere, you are stating it to declare and define it in this compilation unit) and when you link all the modules together you'll end with several definitions of a variable with the same name at several places, and the references from other modules using this variable will don't know which one is to be used.

The compiler doesn't know anything of other compilations for an application when it is compiling module A, so you need some means to tell it what is happening around. The same as you use function prototypes to indicate it that there's a function somewhere that takes some number of arguments of types A, B, C, etc. and returns a value of type Z, you need to tell it that there's a variable defined elsewhere that has type X, so all the accesses you do to it in this module will be compiled correctly.

like image 28
Luis Colorado Avatar answered Oct 24 '22 00:10

Luis Colorado