I was recently fighting some problems trying to compile an open source library on my Mac that depended on another library and got some errors about incompatible library architectures. Can somebody explain the concept behind compiling a C program for a specific architecture? I have seen the -arch
compiler flag before and have seen values passed to it such as ppc
, i386
and x86_64
which I assume maps to the CPU "language", but my understanding stops there. If one program uses a particular architecture, do all libraries that it loads need to be on the same architecture as well? How can I tell what architecture a given program/process is running under?
Can somebody explain the concept behind compiling a C program for a specific architecture?
Yes. The idea is to translate C to a sequence of native machine instructions, which have the program coded into binary form. The meaning of "architecture" here is "instruction-set architecture", which is how the instructions are coded in binary. For example, every architecture has its own way of coding for an instruction that adds two integers.
The reason to compile to machine instructions is that they run very, very fast.
If one program uses a particular architecture, do all libraries that it loads need to be on the same architecture as well?
Yes. (Exceptions exist but they are rare.)
How can I tell what architecture a given program/process is running under?
If a process is running on your hardware, it is running on the native architecture which on Unix you can discover by running the command uname -m
, although for the human reader the output from uname -a
may be more informative.
If you have an executable binary or a shared library (.so file), you can discover its architecture using the file
command:
% file /lib/libm-2.10.2.so
/lib/libm-2.10.2.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
% file /bin/ls
/bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, stripped
You can see that these binaries have been compiled for the very old 80386 architecture, even though my hardware is a more modern i686. The i686 (Pentium Pro) is backward compatible with 80386 and runs 80386 binaries as well as native binaries. To make this backward compatibility possible, Intel went to a great deal of trouble and expense—but they practically cornered the market on desktop CPUs, so it was worth it!
One thing that may be confusing here is that the Mac platform has what they call a universal binary, which is really two binaries in one archive, one for intel and the other for ppc architecture. Your computer will automatically decide which one to run. You can (sometimes) run a binary for another architecture in an emulation mode, and some architectures are supersets of others (ie. i386 code will usually run on a i486, i586, i686, etc.) but for the most part the only code you can run is code for your processor's architecture.
For cross compiling, not only the program, but all the libraries it uses, need to be compatible with the target processor. Sometimes this means having a second compiler installed, sometimes it is just a question of having the right extra module for the compiler availible. The cross compiler for gcc is actually a seperate executable, though it can sometimes be accessed via a command line switch. The gcc cross compilers for various architectures are most likely separate installs.
To build for a different architecture than the native of your CPU, you will need a cross-compiler, which means that the code generated cannot run natively on the machine your sitting on. GCC can do this fine. To find out which architecture a program is built for check out the file command. In Linux-based systems at least, a 32-bit x86 program will require 32-bit x86 libs to go along with it. I guess it's the same for most OSes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With