Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC dies trying to compile 64bit code on OSX 10.6

Tags:

c++

c

macos

gcc

I have a brand-new off-the-cd OSX 10.6 installation. I'd now like to compile the following trivial C program as a 64bit binary:

 #include <stdio.h>

 int main() 
 {
    printf("hello world");
    return 0;
 }

I invoke gcc as follows:

gcc -m64 hello.c

However, this fails with the following error:

Undefined symbols:

  "___gxx_personality_v0", referenced from:
  _main in ccUAOnse.o
  CIE in ccUAOnse.o
  ld: symbol(s) not found
  collect2: ld returned 1 exit status

What's going on here? Why is gcc dying? Compiling without the -m64 flag works fine.

like image 239
Daniel Avatar asked Dec 03 '22 06:12

Daniel


1 Answers

Two things:

I don't think you actually used gcc -m64 hello.c. The error you got is usually the result of doing something like gcc -m64 hello.cc- using the C compiler to compile C++ code.

shell% gcc -m64 hello.c
shell% ./a.out
hello world [added missing newline]
shell% cp hello.c hello.cc
shell% gcc -m64 hello.cc
Undefined symbols:
  "___gxx_personality_v0", referenced from:
      _main in ccYaNq32.o
      CIE in ccYaNq32.o
ld: symbol(s) not found
collect2: ld returned 1 exit status

You can "get this to work" with the following:

shell% gcc -m64 hello.cc -lstdc++
shell% ./a.out
hello world

Second, -m64 is not the preferred way of specifying that you'd like to generate 64-bit code on Mac OS X. The preferred way is to use -arch ARCH, where ARCH is one of ppc, ppc64, i386, or x86_64. There may be more (or less) architectures available depending on how your tools are set up (i.e., iPhone ARM, ppc64 deprecated, etc). Also, on 10.6, gcc defaults to -arch x86_64, or generating 64-bit code by default.

Using this style, it's possible to have the compiler create "fat binaries" automatically- you can use -arch multiple times. For example, to create a "Universal Binary":

shell% gcc -arch x86_64 -arch i386 -arch ppc hello.c
shell% file a.out
a.out: Mach-O universal binary with 3 architectures
a.out (for architecture x86_64):    Mach-O 64-bit executable x86_64
a.out (for architecture i386):  Mach-O executable i386
a.out (for architecture ppc7400):   Mach-O executable ppc

EDIT: The following was added to answer the OPs question "I did make a mistake and call my file .cc instead of .c. I'm still confused about why this should matter?"

Well... that's a sort of complicated answer. I'll give a brief explanation, but I'll ask that you have a little faith that "there's actually a good reason."

It's fair to say that "compiling a program" is a fairly complicated process. For both historical and practical reasons, when you execute gcc -m64 hello.cc, it's actually broken up in to several discrete steps behind the scenes. These steps, each of which usually feeds the result of each step to the next step, are approximately:

  • Run the C Pre-Processor, cpp, on the source code that is being compiled. This step is responsible for performing all the #include statements, various #define macro expansions, and other "pre-processing" stuff.
  • Run the C compiler proper on the C Pre-Processed results. The output of this step is a .s file, or the result of the C code compiled to assembly language.
  • Run the as assembler on the .s source. This assembles the assembly language in to a .o object file.
  • Run the ld linker on the .o file(s) to link the various compiled object files and various static and dynamically linked libraries in to a useable executable.

Note: This is a "typical" flow for most compilers. An individual implementation of a compiler doesn't have to follow the above steps. Some compilers combine multiple steps in to one for performance reasons. Modern versions of gcc, for example, don't use a separate cpp pass. The tcc compiler, on the other hand, performs all the above steps in one pass, using no additional external tools or intermediate steps.

In the above, traditional compiler tool chain flow, the cc (or, in our case, gcc) command is called a "compiler driver". It's a "logical front end" to all of the above tools and steps and knows how to intelligently apply all the steps and tools (like the assembler and linker) in order to create a final executable. In order to do this, though, it usually needs to know "the kind of" file it is dealing with. You can't really feed an assembled .o file to the C compiler, for example. Therefore, there are a couple of "standard" .* designations used to specify the "kind" of file (see man gcc for more info):

  • .c, .h C source code and C header files.
  • .m Objective-C source code.
  • .cc, .cp, .cpp, .cxx, .c++ C++ Source code.
  • .hh C++ header file.
  • .mm, .M Objective-C++ source code.
  • .s Assembly language source code.
  • .o Assembled object code.
  • .a ar archive or static library.
  • .dylib Dynamic shared library.

It's also possible to over-ride this "automatically determined file type" using various compiler flags (see man gcc for how to do this), but it's generally MUCH easier to just stick with the standard conventions so that everything "just works" automatically.

And, in a round about way, if you had used the C++ "compiler driver", or g++, in your original example, you wouldn't have encountered this problem:

shell% g++ -m64 hello.cc
shell% ./a.out
hello world

The reason for this is gcc essentially says "Use C rules when driving the tool chain" and g++ says "Use C++ rules when driving the tool chain". g++ knows that to create a working executable, it needs to pass -lstdc++ to the linker stage, whereas gcc obviously doesn't think this is necessary even though it knew to use the C++ compiler at the "Compile the source code" stage because of the .cc file ending.

Some of the other C/C++ compilers available to you on Mac OS X 10.6 by default: gcc-4.0, gcc-4.2, g++-4.0, g++-4.2, llvm-gcc, llvm-g++, llvm-gcc-4.0, llvm-g++-4.0, llvm-gcc-4.2, llvm-g++-4.2, clang. These tools (usually) swap out the first two steps in the tool chain flow and use the same lower-level tools like the assembler and linker. The llvm- compilers use the gcc front end to parse the C code and turn it in to an intermediate representation, and then use the llvm tools to transform that intermediate representation in to code. Since the llvm tools use a "low-level virtual machine" as its near-final output, it allows for a richer set of optimization strategies, the most notable being that it can perform optimizations across different, already compiled .o files. This is typically called link time optimization. clang is a completely new C compiler that also targets the llvm tools as its output, allowing for the same kinds of optimizations.

So, there you go. The not so short explanation of why gcc -m64 hello.cc failed for you. :)

EDIT: One more thing...

It's a common "compiler driver technique" to have commands like gcc and g++ sym-link to the same "all-in-one" compiler driver executable. Then, at run time, the compiler driver checks the path and file name that was used to create the process and dynamically switch rules based on whether that file name ends with gcc or g++ (or equivalent). This allows the developer of the compiler to re-use the bulk of the front end code and then just change the handful of differences required between the two.

like image 180
johne Avatar answered Dec 04 '22 19:12

johne