Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the point of internal linkage in C++

Tags:

c++

linkage

I understand that there are three possible linkage values for a variable in C++ - no linkage, internal linkage and external linkage.

So external linkage means that the variable identifier is accessible in multiple files, and internal linkage means that it is accessible within the same file. But what is the point of internal linkage? Why not just have two possible linkages for an identifier - no linkage and external linkage? To me it seems like global (or file) scope and internal linkage serve the same purpose.

Is there any use case where internal linkage is actually useful that is not covered by global scope?

In the below example, I have two pieces of code - the first one links to the static int i11 (which has internal linkage), and the second one does not. Both pretty much do the same thing, since main already has access to the variable i11 due to its file scope. So why have a separate linkage called internal linkage.

static int i11 = 10;

int main()
{
extern int i11;
cout << ::i11;
return 0;
}

gives the same result as

static int i11 = 10;

int main()
{
 cout << ::i11;
 return 0;
}

EDIT: Just to add more clarity, as per HolyBlackCat's definition below, internal linkage really means you can forward-declare a variable within the same translation unit. But why would you even need to do that for a variable that is already globally accessible within the file .. Is there any use case for this feature?

like image 921
programmerravi Avatar asked Jan 11 '18 23:01

programmerravi


1 Answers

Examples of each:

External linkage:

foo.h

    extern int foo; // Declaration

foo.cpp

    extern int foo = 42; // Definition

bar.cpp

    #include "foo.h"

    int bar() { return foo; } // Use

Internal linkage:

foo.cpp

    static int foo = 42; // No relation to foo in bar.cpp


bar.cpp

    static int foo = -43; // No relation to foo in foo.cpp

No linkage:

foo.cpp

    int foo1() { static int foo = 42; foo++; return foo; }
    int foo2() { static int foo = -43; foo++; return foo; }

Surely you will agree that the foo variables in functions foo1 and foo2 have to have storage. This means they probably have to have names because of how assemblers and linkers work. Those names cannot conflict and should not be accessible by any other code. The way the C++ standard encodes this is as "no linkage." There are a few other cases where it is used as well, but for things where it is a little less obvious what the storage is used for. (E.g. for class you can imagine the vtable has storage, but for a typedef it is mostly a matter of language specification minutiae about access scope of the name.)

C++ specifies somewhat of a least common denominator linkage model that can be mapped onto the richer models of actual linkers on actual platforms. In practice this is highly imperfect and lots of real systems end up using attributes, pragmas, or compiler flags to gain greater control of linkage type. In order to do this and still provide a reasonably useful language, one gets into name mangling and other compiler techniques. If C++ were ever to try and provide a greater degree of compiled code interop, such as Java or .NET virtual machines do, it is very likely the language would gain clearer and more elaborate control over linkage.

EDIT: To more clearly answer the question... The standard has to define how this works for both access to identifiers in the source language and linkage of compiled code. The definition must be strong enough so that correctly written code never produces errors for things being undefined or multiply defined. There are certainly better ways to do this than C++ uses, but it is largely an evolved language and the specification is somewhat influenced by the substrate it is compiled onto. In effect the three different types of linkage are:

  • External linkage: The entire program agrees on this name and it can be access anywhere there is a declaration visible.
  • Internal linkage: A single file agrees on this name and it can be accessed in any scope the declaration is visible.
  • No linkage: The name is for one scope only and can only be accessed within this scope.

In the assembly, these tend to map into a global declaration, a file local declaration, and a file local declaration with a synthesized unique name.

It is also relevant for cases where the same name is declared with different linkage in different parts of the program and in determining what extern int foo refers to from a given place.

like image 52
Zalman Stern Avatar answered Oct 08 '22 11:10

Zalman Stern