I am reading this code from here(in Chinese). There is one piece of code about testing global variable in C. The variable a
has been defined in the file t.h
which has been included twice. In file foo.c
defined a struct b
with some value and a main
function. In main.c
file, defined two variables without initialized.
/* t.h */
#ifndef _H_
#define _H_
int a;
#endif
/* foo.c */
#include <stdio.h>
#include "t.h"
struct {
char a;
int b;
} b = { 2, 4 };
int main();
void foo()
{
printf("foo:\t(&a)=0x%08x\n\t(&b)=0x%08x\n
\tsizeof(b)=%d\n\tb.a=%d\n\tb.b=%d\n\tmain:0x%08x\n",
&a, &b, sizeof b, b.a, b.b, main);
}
/* main.c */
#include <stdio.h>
#include "t.h"
int b;
int c;
int main()
{
foo();
printf("main:\t(&a)=0x%08x\n\t(&b)=0x%08x\n
\t(&c)=0x%08x\n\tsize(b)=%d\n\tb=%d\n\tc=%d\n",
&a, &b, &c, sizeof b, b, c);
return 0;
}
After using Ubuntu GCC 4.4.3 compiling, the result is like this below:
foo: (&a)=0x0804a024
(&b)=0x0804a014
sizeof(b)=8
b.a=2
b.b=4
main:0x080483e4
main: (&a)=0x0804a024
(&b)=0x0804a014
(&c)=0x0804a028
size(b)=4
b=2
c=0
Variable a
and b
has the same address in two function, but the size of b
has changed. I can't understand how it worked!
A global variable is accessible to all functions in every source file where it is declared. To avoid problems: Initialization — if a global variable is declared in more than one source file in a library, it should be initialized in only one place or you will get a compiler error.
There is no problem if it is inside a function as it is a local variable. If it is a global declaration, it depends on the linker. You can get a link error (or if you use options to direct it to do so). Otherwise they will be consolidated into one location as if you had declared them external.
In C, multiple global variables are "merged" into one. So you have indeed just one global variable, declared multiple times.
You are violating C's "one definition rule", and the result is undefined behavior. The "one definition rule" is not formally stated in the standard as such. We are looking at objects in different source files (aka, translation units), so we concerned with "external definitions". The "one external definition" semantic is spelled out (C11 6.9 p5):
An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a
sizeof
or_Alignof
operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one.
Which basically means you are only allowed to define an object at most once. (The otherwise clause allows you to not define an external object at all if it is never used anywhere in the program.)
Note that you have two external definitions for b
. One is the structure that you initialize in foo.c
, and the other is the tentative definition in main.c
, (C11 6.9.2 p1-2):
If the declaration of an identifier for an object has file scope and an initializer, the declaration is an external definition for the identifier.
A declaration of an identifier for an object that has file scope without an initializer, and without a storage-class specifier or with the storage-class specifier
static
, constitutes a tentative definition. If a translation unit contains one or more tentative definitions for an identifier, and the translation unit contains no external definition for that identifier, then the behavior is exactly as if the translation unit contains a file scope declaration of that identifier, with the composite type as of the end of the translation unit, with an initializer equal to 0.
So you have multiple definitions of b
. However, there is another error, in that you have defined b
with different types. First note that multiple declarations to the same object with external linkage is allowed. However, when the same name is used in two different source files, that name refers to the same object (C11 6.2.2 p2):
In the set of translation units and libraries that constitutes an entire program, each declaration of a particular identifier with external linkage denotes the same object or function.
C puts a strict limitation on declarations to the same object (C11 6.2.7 p2):
All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.
Since the types for b
in each of your source files do not actually match, the behavior is undefined. (What constitutes a compatible type is described in detail in all of C11 6.2.7, but it basically boils down to being that the types have to match.)
So you have two failings for b
:
Technically, your declaration of int a
in both of your source files also violates the "one definition rule". Note that a
has external linkage (C11 6.2.2 p5):
If the declaration of an identifier for an object has file scope and no storage-class specifier, its linkage is external.
But, from the quote from C11 6.9.2 earlier, those int a
tentative definitions are external definitions, and you are only allowed one of those from the quote from C11 6.9 at the top.
The usual disclaimers apply for undefined behavior. Anything can happen, and that would include the behavior you observed.
A common extension to C is to allow multiple external definitions, and is described in the C standard in the informative Annex J.5 (C11 J.5.11):
There may be more than one external definition for the identifier of an object, with or without the explicit use of the keyword
extern
; if the definitions disagree, or more than one is initialized, the behavior is undefined (6.9.2).
(Emphasis is mine.) Since the definitions for a
agree, there is no harm there, but the definitions for b
do not agree. This extension explains why your compiler does not complain about the presence of multiple definitions. From the quote of C11 6.2.2, the linker will attempt to reconcile the multiple references to the same object.
Linkers typically use one of two models for reconciling multiple definitions of the same symbol in multiple translation units. These are the "Common Model" and the "Ref/Def Model". In the "Common Model", multiple objects with the same name are folded into a single object in a union
style manner so that the object takes on the size of the largest definition. In the "Ref/Def Model", each external name must have exactly one definition.
The GNU toolchain uses the "Common Model" by default, and a "Relaxed Ref/Def Model", where it enforces a strictly one definition rule for a single translation unit, but does not complain about violations across multiple translation units.
The "Common Model" can be suppressed in the GNU compiler by using the -fno-common
option. When I tested this on my system, it caused "Strict Ref/Def Model" behavior for code similar to yours:
$ cat a.c
#include <stdio.h>
int a;
struct { char a; int b; } b = { 2, 4 };
void foo () { printf("%zu\n", sizeof(b)); }
$ cat b.c
#include <stdio.h>
extern void foo();
int a, b;
int main () { printf("%zu\n", sizeof(b)); foo(); }
$ gcc -fno-common a.c b.c
/tmp/ccd4fSOL.o:(.bss+0x0): multiple definition of `a'
/tmp/ccMoQ72v.o:(.bss+0x0): first defined here
/tmp/ccd4fSOL.o:(.bss+0x4): multiple definition of `b'
/tmp/ccMoQ72v.o:(.data+0x0): first defined here
/usr/bin/ld: Warning: size of symbol `b' changed from 8 in /tmp/ccMoQ72v.o to 4 in /tmp/ccd4fSOL.o
collect2: ld returned 1 exit status
$
I personally feel the last warning issued by the linker should always be provided regardless of the resolution model for multiple object definitions, but that is neither here nor there.
References:
Unfortunately, I can't give you the link to my copy of the C11 Standard
What are extern
variables in C?
The "Beginner's Guide to Linkers"
SAS Documentation on External Variable Models
Formally, it is illegal to define the same variable (or function) with external linkage more than once. So, from the formal point of view the behavior of your program is undefined.
Practically, allowing multiple definitions of the same variable with external linkage is a popular compiler extension (a common extension, mentioned as such in the language specification). However, in order to be used properly, each definition shall declare it with the same type. And no more than one definition shall include initializer.
Your case does not match the common extension description. Your code compiles as a side effect of that common extension, but its behavior is still undefined.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With