Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this C program compile with no errors?

Tags:

c

compilation

I have the two C files, main.c and weird.c:

// main.c    
int weird(int *);

int
main(void)
{
    int x, *y;

    y = (int *)7;
    x = weird(y);
    printf("x = %d\n", x);
    return (0);
}

// weird.c

char *weird = "weird";

However, when I run the following:

clang -Wall -Wextra -c main.c
clang -Wall -Wextra -c weird.c
clang -o program main.o weird.o

I do not get any errors. Why is this? Shouldn't there at least be linking errors? Note that I am just talking about compiling the files — not running them. Running gives a segmentation fault.

like image 332
user328950 Avatar asked Apr 29 '16 01:04

user328950


People also ask

What causes compilation error in C?

Compilation error refers to a state when a compiler fails to compile a piece of computer program source code, either due to errors in the code, or, more unusually, due to errors in the compiler itself. A compilation error message often helps programmers debugging the source code.

Why does C need to be compiled?

Compiling a C program:- Behind the Scenes. C is a mid-level language and it needs a compiler to convert it into an executable code so that the program can be run on our machine.

What happens when C program is compiled?

Whenever a C program file is compiled and executed, the compiler generates some files with the same name as that of the C program file but with different extensions.

What are the compilation errors in C?

There are 5 different types of errors in C programming language: Syntax error, Run Time error, Logical error, Semantic error, and Linker error. Syntax errors, linker errors, and semantic errors can be identified by the compiler during compilation.


3 Answers

Should there be a linker error?

The short answer to "Shouldn't there at least be linking errors?" is "There is no guarantee that there'll be a linking error". The C standard doesn't mandate it.

As Raymond Chen noted in a comment:

The language-lawyer answer is that the standard does not require a diagnostic for this error. The practical answer is that C does not type-decorate symbols with external linkage, so the type mismatch goes undetected.

One of the reasons C++ has type-safe linkage is to avoid problems with code analogous to this (though the main reason is to allow for function name overloading — resolving this sort of problem is, perhaps, more a side-effect).

The C standard says:

§6.9 External definitions

¶5 An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof or _Alignof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one.

§5.1.1.1 Program structure

¶1 A C program need not all be translated at the same time. The text of the program is kept in units called source files, (or preprocessing files) in this International Standard. A source file together with all the headers and source files included via the preprocessing directive #include is known as a preprocessing translation unit. After preprocessing, a preprocessing translation unit is called a translation unit. Previously translated translation units may be preserved individually or in libraries. The separate translation units of a program communicate by (for example) calls to functions whose identifiers have external linkage, manipulation of objects whose identifiers have external linkage, or manipulation of data files. Translation units may be separately translated and then later linked to produce an executable program.

5.1.1.2 Translation phases

  1. All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment.

The linking is done based on the names of external definitions, not on the types of the objects identified by the name. The onus is on the programmer to ensure that the type of the function or object for each external definition is consistent with the way it is used.


Avoiding the problem

In a comment, I said:

This [question] is an argument for making use of headers to ensure that different parts of a program are coherent. If you never declare an external function in a source file but only in headers, and use the headers wherever the relevant symbol (in this case weird) is used or defined, then the code would not all compile. You could either have a function or a string, but not both. You'd have a header weird.h which contains either extern char *weird; or extern int weird(int *p); (but not both), and both main.c and weird.c would include the header, and only one of them would compile successfully.

To which there came the response:

What could I add to these files to ensure that the error is detected and thrown when main.c is compiled?

You'd create 3 source files. The code shown here is slightly more complicated than you'd normally use because it allows you to use conditional compilation to compile the code with either a function or a variable as the 'external identifier with external linkage' called weird. Normally, you'd select one intended representation for weird and only allow that to be exposed.

weird.h

#ifndef WEIRD_H_INCLUDED
#define WEIRD_H_INCLUDED

#ifdef USE_WEIRD_STRING
extern const char *weird;
#else
extern int weird(int *p);
#endif

#endif /* WEIRD_H_INCLUDED */

main.c

#include <stdio.h>
#include "weird.h"

int main(void)
{
    int x, *y;

    y = (int *)7;
    x = weird(y);
    printf("x = %d\n", x);
    return (0);
}

weird.c

#include "weird.h"

#ifdef USE_WEIRD_STRING
const char *weird = "weird";
#else
int weird(int *p)
{
    if (p == 0)
        return 42;
    else
        return 99;
}
#endif

Valid compilation sequences

gcc -c weird.c
gcc -c main.c
gcc -o program weird.o main.o

gcc -o program -DUSE_WEIRD_FUNCTION main.c weird.c

Both these work because the code is compiled to use the weird() function. The header, in both cases, ensures that the compilations are consistent.

Invalid compilation sequence

gcc -c -DUSE_WEIRD_STRING weird.c
gcc -c main.c
gcc -o program weird.o main.o

This is basically the same as the setup in the question. The weird.c file is compiled to create a string called weird, but the main.c code is compiled expecting to use a function weird(). The linker does link the code, but things go disastrously wrong when the function call in main() is retargeted to the "weird". The chances are that the memory where it is stored is not executable and the execution fails because of that. Otherwise, the string is interpreted as machine code and it probably doesn't do anything meaningful and leads to a crash. Neither is desirable; neither is guaranteed — this is a result of invoking undefined behaviour.

If you tried to compile main.c with -DUSE_WEIRD_STRING, the compilation would fail because the header would indicate that weird is a char * and the code would try to use it as a function.

If you replaced the conditional code in weird.c with either the string or the function (unconditionally), then:

  • Either the compilation would fail if the file contained the function but -DUSE_WEIRD_STRING was set on the command line,
  • Or the compilation would fail if the file contained the string but you did not set -DUSE_WEIRD_STRING.

Normally, the header would contain an unconditional declaration for weird, either as a function or as a pointer (but without any provision for choosing between them at compile time).

The key point is that the header is included in both source files, so unless the conditional compilation flags make a difference, the compiler can check the code in the source files for consistency with the header, and therefore the two object files stand a chance of working together. If you subvert the checking by setting the compilation flags so that the two source files see different declarations in the header, then you're back to square one.

The header, therefore, declares the interfaces, and the source files are checked to ensure that they adhere to the interface. The headers are the glue that hold the system together. Consequently, any function (or variable) that must be accessed outside its source file should be declared in a header (one header only), and that header should be used in the source file where the function (or variable) is defined, and also in every source file that references the function (or variable). You should not write extern … weird …; in a source file; such declarations belong in a header. All functions (or variables) that are not referenced outside the source file where they're defined should be defined with static. This gives you the maximum chance of spotting problems before you run the program.

You can use GCC to help you. For functions, you can insist on prototypes being in scope before a (non-static) function is referenced or defined (and before a static function is referenced — you can simply define a static function before it is referenced without a separate prototype). I use:

gcc -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes \
    -Wold-style-definition -Wold-style-declaration …

The -Wall and -Wextra imply some, but not all, of the other -W… options, so that isn't a minimal set. And not all versions of GCC support both the -Wold-style-… options. But together, these options ensure that functions have a full prototype declaration before the function is used.

like image 59
Jonathan Leffler Avatar answered Sep 21 '22 08:09

Jonathan Leffler


Neither file on its own contains any error that would cause a problem with compilation. main.c correctly declares (but doesn't define) a function called weird and calls it, and weird.c correctly defines a char * named weird. After compilation, main.o contains an unresolved reference to weird and weird.o contains a definition.

Now, here's the fun part: neither .o file necessarily[*] contains anything about the type of weird. Just names and addresses. By the time linking is happening, it's too late to say "hey, main expects this to be an int(*)(int *) and what you provided is actually a char *!" All the linker does is see that the name weird is provided by one object and referenced by another, and fits the pieces together like a jigsaw puzzle. In C, it's entirely the programmer's job to make sure that all compilation units that use a given external symbol declare it with compatible types (not necessarily identical; there are intricate rules as to what are "compatible types"). If you don't, the resulting behavior is undefined and probably wrong.

[*]: actually I can think of several cases where the object files do contain the types — for example, certain kinds of debugging information, or special .o files for link-time optimization. But as far as I know, even when the type information does exist, the linker doesn't use it to warn about things like this.

like image 20
hobbs Avatar answered Sep 23 '22 08:09

hobbs


I am taking a Linux point of view. Details could be different on other OSes.

Of course your latest edit cannot compile, since:

char *weird = "weird";
printf(weird); // wrong, remove this line

contains a statement (the printf) outside of any function. So let assume you have removed that line. And clang-3.7 -Wall -Wextra -c main.c gives several warnings:

main.c:7:9: warning: cast to 'int *' from smaller integer type 'int' [-Wint-to-pointer-cast]
    w = (int *)x;
        ^

main.c:8:9: warning: cast to 'int *' from smaller integer type 'int' [-Wint-to-pointer-cast]
    y = (int *)z;
        ^
main.c:7:16: warning: variable 'x' is uninitialized when used here [-Wuninitialized]
    w = (int *)x;
               ^
main.c:6:10: note: initialize the variable 'x' to silence this warning
    int x, *y, z, *w;
         ^
          = 0
main.c:8:16: warning: variable 'z' is uninitialized when used here [-Wuninitialized]
    y = (int *)z;
               ^
main.c:6:17: note: initialize the variable 'z' to silence this warning
    int x, *y, z, *w;
                ^
                 = 0
4 warnings generated.

Technically, I guess that your example is some undefined behavior. Then the implementation is not supposed to warn you, and bad things can happen (or not!).

You might have some warning (but I am not sure) if you enabled link time optimization both at compile time and at link time, perhaps with

gcc -flto -Wall -Wextra -O  -c main.c
gcc -flto -Wall -Wextra -O  -c weird.c
gcc -flto -Wall -Wextra -O  main.o weird.o -o program

and you could replace gcc by clang if you wish to. I guess that asking for optimization (-O) is relevant.

Actually I am getting no warnings with clang-3.7 -flto but I am getting a warning (at the last link command) with gcc 6:

 % gcc -flto -O -Wall -Wextra  weird.o main.o -o program  
 main.c:1:5: error: variable ‘weird’ redeclared as function
 int weird(int *);
     ^
 weird.c:3:7: note: previously declared here
 char *weird = "weird";
       ^
 lto1: fatal error: errors during merging of translation units
 compilation terminated.

(I am explaining for GCC which I know well, including some of its internals; for clang it should be similar)

With -flto the compiler (e.g. lto1 with GCC) is also running for linking (so can optimize then, e.g. inlining calls between translation units). It is using compiler intermediate representations stored in object files (and these representations contain typing information). Without it, the last command (e.g. your clang main.o weird.o -o program) is simply invoking the linker ld with appropriate options (e.g. for crt0 & C standard library)

Current linkers don't keep or handle any type information (pedantically doing some type erasure, mostly done by the compiler itself). They just manage symbols (that is C identifiers) in some simple symbol table and process relocations. Lack of type information in object files (more precisely, the symbol tables known to the linker) is why name mangling is required for C++.

Read more about ELF, e.g. elf(5), the format used by object files and executables.

Replace clang or gcc by clang -v or gcc -v to understand what is happening (so it would show you the underlying cc1 or lto1 or ld processes).

As others explained, you really should share a common #include-d header file (if the C code is not machine generated but hand written). Some C code generators might avoid generating header files and would generate relevant (and identical) declarations in every generated C file.

like image 26
Basile Starynkevitch Avatar answered Sep 25 '22 08:09

Basile Starynkevitch