I am aware of how ODR, linkage, <code>static</code>, and <code>extern "C"</code> work with functions. But I am not sure about visibility of types since they cannot be declared <code>static</code> and there are no anonymous namespaces in C. In particular, I would like to know the validity of the following code if compiled as C and C++ <pre class="prettyprint"><code>// A.{c,cpp} typedef struct foo_t{ int x; int y; } Foo; static int use_foo() { Foo f; f.x=5; return f.x; } </code></pre> <pre class="prettyprint"><code>// B.{c,cpp} typedef struct foo_t{ double x; } Foo; static int use_foo() { Foo f; f.x=5.0; return f.x;// Cast on purpose } </code></pre> using the following two commands (I know both compilers autodetect the language based on extensions, hence the different names). <ul> <li><code>g++ -std=c++17 -pedantic -Wall -Wextra a.cpp b.cpp</code></li> <li><code>gcc -std=c11 -pedantic -Wall -Wextra a.c b.c</code></li> </ul> Versions 8.3 happily compile both without any errors. Clearly, if both struct symbols have external linkage, there is ODR violation because the definitions are not identical. Yes, compiler is not required to report it, hence my question because neither did. <h4>Is it valid C++ program?</h4> I do not think so, that is what anonymous namespaces are for. <h4>Is it valid C program?</h4> I am not sure here, I have read that types are considered <code>static</code> which would make the program valid. Can someone please confirm? <h4>C,C++ Compatibility</h4> If these definitions were in public header files, perhaps in different C libraries, and a C++ program includes both, each also in a different TU, would that be ODR? How can one prevent this? Does <code>extern "C"</code> play any role?

I will use for references the n1570 draft for C11 for the C language and the draft n4860 for C++20 for the C++ language. <ol> <li> C language Types have no linkage in C: 6.2.2 Linkages of identifiers §6: <blockquote> The following identifiers have no linkage: an identifier declared to be anything other than an object or a function... </blockquote> That means that the types used in a.c and b.c are unrelated: you correctly declare different objects in both compilation units. </li> <li> C++ language Types do have linkage in C++. 6.6 Program and linkage [basic.link] says (emphasize mine): <ul> <li>§2:</li> </ul> <blockquote> A name is said to have linkage when it might denote the same object, reference, function, type, template, namespace or value as a name introduced by a declaration in another scope </blockquote> <ul> <li>§4</li> </ul> <blockquote> An unnamed namespace or a namespace declared directly or indirectly within an unnamed namespace has internal linkage. All other namespaces have external linkage. A name having namespace scope that has not been given internal linkage above and that is the name of ... a named class... ... has its linkage determined as follows: — if the enclosing namespace has internal linkage, the name has internal linkage; — otherwise, if the declaration of the name is attached to a named module (10.1) and is not exported (10.2), the name has module linkage; — otherwise, the name has external linkage </blockquote> The types declared in a.cpp and b.cpp share the same identifier with external linkage and are not compatible: the program is ill-formed. </li> </ol> <hr> That being said, most common compiler are able to compile either C or C++ sources, and I would bet a coin that they try hard to share most of the implementation of both languages. For that reason, I would trust real world implementation to produce the expected resuls even for C++ language. But Undefined Behaviour does not forbid expected results...

Other answers point out that this is an ill-formed program in C++. In practice, link errors on overloaded functions would be possible if you have two separate definitions of (non-static) <code>void foo(bar);</code> in separate translation units. I expect this is (part of) why C++ has this rule that (some) types have external linkage. If types were truly private, those wouldn't conflict. But they'll name-mangle the same way, because if both TUs do have the same definition of the type <code>bar</code> (e.g. via a .h or manual copying), they need to resolve to calling the same function. <pre class="prettyprint"><code>// A.cpp typedef struct foo{ // names ending with _t are reserved int x; int y; } Foo; int take_foo(Foo f) { return f.x; } int main(){} // so it's linkable without special options like -nostdlib and linker entry-point defaults </code></pre> <pre class="prettyprint"><code>// B.{c,cpp} typedef struct foo{ double x; } Foo; double take_foo(Foo f) { return f.x; } </code></pre> In case it matters, these functions will compile to different machine code on some targets, including x86-64 System V ABI where I tested it. (The first <code>double</code> arg is already in the return-value register, even if inside a struct containing only a couple doubles. But unlike ARM64 and some other RISCs, the first integer arg is not passed in the return-value register, so a <code>mov</code> is required before the <code>ret</code>.) <pre class="prettyprint"><code>$ g++ [AB].cpp /usr/bin/ld: /tmp/ccM89kvx.o: in function `take_foo(foo)': B.cpp:(.text+0x0): multiple definition of `take_foo(foo)'; /tmp/cckZ5qRG.o:A.cpp:(.text+0x0): first defined here collect2: error: ld returned 1 exit status </code></pre> There's no error if the functions or the struct tags have different names. (And yes, I compiled with optimization disabled, and no link-time optimization, so nothing had a chance to remove unused functions before they conflicted.) However, just changing the typedef name without changing the struct tag isn't sufficient. That makes sense; all typedefs for the same type need to resolve to the same asm name, so GCC mangles based on the struct tag even if you don't use it directly. Note the linker error messages demangling it back to <code>take_foo(foo)</code> not <code>Foo</code>. I didn't go through the standard wording to see if two <code>typedef ... Foo</code> would be legal in ISO C++, despite not being a problem in practice for real-world C++ implementations. Making either function <code>static</code> would fix the problem, too, because it's fine for <code>static</code> functions to have the same asm name. This would also have a linker error if compiled as C, which doesn't have function overloading so it's already a problem to have two non-static <code>take_foo</code> functions in the same program regardless of their args being structs of the same tag-name or not.

Clarification on difference in ODR rules for structs in C and C++

Tags:

c++

c

types

language-lawyer

linkage

I am aware of how ODR, linkage, static, and extern "C" work with functions. But I am not sure about visibility of types since they cannot be declared static and there are no anonymous namespaces in C.

In particular, I would like to know the validity of the following code if compiled as C and C++

// A.{c,cpp}
typedef struct foo_t{
    int x;
    int y;
} Foo;

static int use_foo() 
{ 
    Foo f;
    f.x=5;
    return f.x;
}

// B.{c,cpp}
typedef struct foo_t{
    double x;
} Foo;

static int use_foo() 
{ 
    Foo f;
    f.x=5.0;
    return f.x;// Cast on purpose
}

using the following two commands (I know both compilers autodetect the language based on extensions, hence the different names).

g++ -std=c++17 -pedantic -Wall -Wextra a.cpp b.cpp
gcc -std=c11 -pedantic -Wall -Wextra a.c b.c

Versions 8.3 happily compile both without any errors. Clearly, if both struct symbols have external linkage, there is ODR violation because the definitions are not identical. Yes, compiler is not required to report it, hence my question because neither did.

Is it valid C++ program?

I do not think so, that is what anonymous namespaces are for.

Is it valid C program?

I am not sure here, I have read that types are considered static which would make the program valid. Can someone please confirm?

C,C++ Compatibility

If these definitions were in public header files, perhaps in different C libraries, and a C++ program includes both, each also in a different TU, would that be ODR? How can one prevent this? Does extern "C" play any role?

516

asked Oct 20 '21 08:10

Quimby

Video Answer

3 Answers

I will use for references the n1570 draft for C11 for the C language and the draft n4860 for C++20 for the C++ language.

C language

Types have no linkage in C: 6.2.2 Linkages of identifiers §6:

The following identifiers have no linkage: an identifier declared to be anything other than an object or a function...

That means that the types used in a.c and b.c are unrelated: you correctly declare different objects in both compilation units.
C++ language

Types do have linkage in C++. 6.6 Program and linkage [basic.link] says (emphasize mine):
- §2:
A name is said to have linkage when it might denote the same object, reference, function, type, template, namespace or value as a name introduced by a declaration in another scope
- §4
An unnamed namespace or a namespace declared directly or indirectly within an unnamed namespace has internal linkage. All other namespaces have external linkage. A name having namespace scope that has not been given internal linkage above and that is the name of
...
a named class...
...
has its linkage determined as follows:
— if the enclosing namespace has internal linkage, the name has internal linkage;
— otherwise, if the declaration of the name is attached to a named module (10.1) and is not exported (10.2), the name has module linkage;
— otherwise, the name has external linkage

The types declared in a.cpp and b.cpp share the same identifier with external linkage and are not compatible: the program is ill-formed.

That being said, most common compiler are able to compile either C or C++ sources, and I would bet a coin that they try hard to share most of the implementation of both languages. For that reason, I would trust real world implementation to produce the expected resuls even for C++ language. But Undefined Behaviour does not forbid expected results...

142

answered Oct 23 '22 10:10

Serge Ballesta

For C. The program is valid. The only requirement that applies here is "strict aliasing rule" saying that the object can be accessed only via a l-value of a compatible type (+ a few exception described in 6.5p7).

The compatibility of structures/unions defined in separate translation units is defined in 6.2.7p1.

... two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: If one is declared with a tag, the other shall be declared with the same tag. If both are completed anywhere within their respective translation units, then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types; if one member of the pair is declared with an alignment specifier, the other is declared with an equivalent alignment specifier; and if one member of the pair is declared with a name, the other is declared with the same name. For two structures, corresponding members shall be declared in the same order. For two structures or unions, corresponding bit-fields shall have the same widths. For two enumerations, corresponding members shall have the same values.

Therefore the structures are not compatible in the example.

However, it is not an issue because the f object is created and accessed via locally defined type. UB would be invoked if the object was created with Foo type defined in one translation unit and accessed via other Foo type in the other translation unit:

// A.c
typedef struct foo_t{
    int x;
    int y;
} Foo;

void bar(void *f);

void foo() 
{ 
    Foo f;
    bar(&f);
}

// B.c
typedef struct foo_t{
    double x;
} Foo;

// using void* to avoid passing pointer to incompatible types
void bar(void *f_) 
{ 
    Foo *f = f_;
    f->x=5.0; // UB!
}

answered Oct 23 '22 11:10

tstanisl

Other answers point out that this is an ill-formed program in C++.

In practice, link errors on overloaded functions would be possible if you have two separate definitions of (non-static) void foo(bar); in separate translation units. I expect this is (part of) why C++ has this rule that (some) types have external linkage.

If types were truly private, those wouldn't conflict. But they'll name-mangle the same way, because if both TUs do have the same definition of the type bar (e.g. via a .h or manual copying), they need to resolve to calling the same function.

// A.cpp
typedef struct foo{  // names ending with _t are reserved
    int x;
    int y;
} Foo;

int take_foo(Foo f) {
    return f.x;
}

int main(){}  // so it's linkable without special options like -nostdlib and linker entry-point defaults

// B.{c,cpp}
typedef struct foo{
    double x;
} Foo;

double take_foo(Foo f) {
    return f.x;
}

In case it matters, these functions will compile to different machine code on some targets, including x86-64 System V ABI where I tested it. (The first double arg is already in the return-value register, even if inside a struct containing only a couple doubles. But unlike ARM64 and some other RISCs, the first integer arg is not passed in the return-value register, so a mov is required before the ret.)

$ g++ [AB].cpp
/usr/bin/ld: /tmp/ccM89kvx.o: in function `take_foo(foo)':
B.cpp:(.text+0x0): multiple definition of `take_foo(foo)'; /tmp/cckZ5qRG.o:A.cpp:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status

There's no error if the functions or the struct tags have different names. (And yes, I compiled with optimization disabled, and no link-time optimization, so nothing had a chance to remove unused functions before they conflicted.)

However, just changing the typedef name without changing the struct tag isn't sufficient. That makes sense; all typedefs for the same type need to resolve to the same asm name, so GCC mangles based on the struct tag even if you don't use it directly. Note the linker error messages demangling it back to take_foo(foo) not Foo.

I didn't go through the standard wording to see if two typedef ... Foo would be legal in ISO C++, despite not being a problem in practice for real-world C++ implementations.

Making either function static would fix the problem, too, because it's fine for static functions to have the same asm name.

This would also have a linker error if compiled as C, which doesn't have function overloading so it's already a problem to have two non-static take_foo functions in the same program regardless of their args being structs of the same tag-name or not.