Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Relation between object file and shared object file

what is the relation between shared object(.so) file and object(.o) file?

can you please explain via example?

like image 394
ASHOK Avatar asked Jul 31 '09 05:07

ASHOK


People also ask

What is a shared object file?

A shared library or shared object is a file that is intended to be shared by multiple programs. Symbols used by a program are loaded from shared libraries into memory at load time or runtime.

What is the difference between .so and .a file?

A . a file is a static library, while a . so file is a shared object dynamic library similar to a DLL on Windows.

How object files are linked?

A shared object file holds code and data suitable to be linked in two contexts. First, the link-editor can process it with other relocatable and shared object files to create other object files. Second, the runtime linker combines it with a dynamic executable file and other shared objects to create a process image.

What is the difference between object file and executable?

The main difference between object file and executable file is that an object file is a file generated after compiling the source code while an executable file is a file generated after linking a set of object files together using a linker.


1 Answers

Let's say you have the following C source file, call it name.c

#include <stdio.h> #include <stdlib.h>  void print_name(const char * name) {     printf("My name is %s\n", name); } 

When you compile it, with cc name.c you generate name.o. The .o contains the compiled code and data for all functions and variables defined in name.c, as well as index associated their names with the actual code. If you look at that index, say with the nm tool (available on Linux and many other Unixes) you'll notice two entries:

00000000 T print_name          U printf 

What this means: there are two symbols (names of functions or variables, but not names of classes, structs, or any types) stored in the .o. The first, marked with T actually contains its definition in name.o. The other, marked with U is merely a reference. The code for print_name can be found here, but the code for printf cannot. When your actual program runs it will need to find all the symbols that are references and look up their definitions in other object files in order to be linked together into a complete program or complete library. An object file is therefore the definitions found in the source file, converted to binary form, and available for placing into a full program.

You can link together .o files one by one, but you don't: there are generally a lot of them, and they are an implementation detail. You'd really prefer to have them all collected into bundles of related objects, with well recognized names. These bundles are called libraries and they come in two forms: static and dynamic.

A static library (in Unix) is almost always suffixed with .a (examples include libc.a which is the C core library, libm.a which is the C math library) and so on. Continuing the example you'd build your static library with ar rc libname.a name.o. If you run nm on libname.a you'll see this:

name.o: 00000000 T print_name          U printf 

As you can see it is primarily a big table of object files with an index finding all the names in it. Just like object files it contains both the symbols defined in every .o and the symbols referred to by them. If you were to link in another .o (e.g. date.o to print_date), you'd see another entry like the one above.

If you link in a static library into an executable it embeds the entire library into the executable. This is just like linking in all the individual .o files. As you can imagine this can make your program very large, especially if you are using (as most modern applications are) a lot of libraries.

A dynamic or shared library is suffixed with .so. It, like its static analogue, is a large table of object files, referring to all the code compiled. You'd build it with cc -shared libname.so name.o. Looking at with nm is quite a bit different than the static library though. On my system it contains about two dozen symbols only two of which are print_name and printf:

00001498 a _DYNAMIC 00001574 a _GLOBAL_OFFSET_TABLE_          w _Jv_RegisterClasses 00001488 d __CTOR_END__ 00001484 d __CTOR_LIST__ 00001490 d __DTOR_END__ 0000148c d __DTOR_LIST__ 00000480 r __FRAME_END__ 00001494 d __JCR_END__ 00001494 d __JCR_LIST__ 00001590 A __bss_start          w __cxa_finalize@@GLIBC_2.1.3 00000420 t __do_global_ctors_aux 00000360 t __do_global_dtors_aux 00001588 d __dso_handle          w __gmon_start__ 000003f7 t __i686.get_pc_thunk.bx 00001590 A _edata 00001594 A _end 00000454 T _fini 000002f8 T _init 00001590 b completed.5843 000003c0 t frame_dummy 0000158c d p.5841 000003fc T print_name          U printf@@GLIBC_2.0 

A shared library differs from a static library in one very important way: it does not embed itself in your final executable. Instead the executable contains a reference to that shared library that is resolved, not at link time, but at run-time. This has a number of advantages:

  • Your executable is much smaller. It only contains the code you explicitly linked via the object files. The external libraries are references and their code does not go into the binary.
  • You can share (hence the name) one library's bits among multiple executables.
  • You can, if you are careful about binary compatibility, update the code in the library between runs of the program, and the program will pick up the new library without you needing to change it.

There are some disadvantages:

  • It takes time to link a program together. With shared libraries some of this time is deferred to every time the executable runs.
  • The process is more complex. All the additional symbols in the shared library are part of the infrastructure needed to make the library link up at run-time.
  • You run the risk of subtle incompatibilities between differing versions of the library. On Windows this is called "DLL hell".

(If you think about it many of these are the reasons programs use or do not use references and pointers instead of directly embedding objects of a class into other objects. The analogy is pretty direct.)

Ok, that's a lot of detail, and I've skipped a lot, such as how the linking process actually works. I hope you can follow it. If not ask for clarification.

like image 188
quark Avatar answered Oct 13 '22 04:10

quark