Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why don't two binaries of programs with only comments changed exactly match in gcc?

I created two C programs

  1. Program 1

    int main() { } 
  2. Program 2

    int main() { //Some Harmless comments } 

AFAIK, when compiling, the compiler(gcc) should ignore the comments and redundant whitepaces, and hence the output must be similar.

But when I checked the md5sums of the output binaries, they don't match. I also tried compiling with optimisation -O3 and -Ofast but they still didn't match.

What is happening here?

EDIT: the exact commands and there md5sums are(t1.c is program 1 and t2.c is program 2)

gcc ./t1.c -o aaa gcc ./t2.c -o bbb 98c1a86e593fd0181383662e68bac22f  aaa c10293cbe6031b13dc6244d01b4d2793  bbb  gcc ./t2.c -Ofast -o bbb gcc ./t1.c -Ofast -o aaa 2f65a6d5bc9bf1351bdd6919a766fa10  aaa c0bee139c47183ce62e10c3dbc13c614  bbb   gcc ./t1.c -O3 -o aaa gcc ./t2.c -O3 -o bbb 564a39d982710b0070bb9349bfc0e2cd  aaa ad89b15e73b26e32026fd0f1dc152cd2  bbb 

And yes, md5sums match across multiple compilations with same flags.

BTW my system is gcc (GCC) 5.2.0 and Linux 4.2.0-1-MANJARO #1 SMP PREEMPT x86_64 GNU/Linux

like image 615
Registered User Avatar asked Sep 04 '15 14:09

Registered User


People also ask

What does E flag do in GCC?

It tells GCC to stop after the preprocessing stage. Details in the link.

What is linking in GCC?

Linking is performed when the input file are object files " .o " (instead of source file " . cpp " or " . c "). GCC uses a separate linker program (called ld.exe ) to perform the linking.

How does GCC compiler work?

GCC's external interface follows Unix conventions. Users invoke a language-specific driver program ( gcc for C, g++ for C++, etc.), which interprets command arguments, calls the actual compiler, runs the assembler on the output, and then optionally runs the linker to produce a complete executable binary.

What is the significance of C flag in GCC?

-c will instruct gcc to only compile the source file to an .o (object) file but does not invoke the linker. With a project containing many . c files one will typically compile first all . c files to .o files and then link everything together with the libraries.


2 Answers

It's because the file names are different (although the strings output is the same). If you try modifying the file itself (rather than having two files), you'll notice that the output binaries are no longer different. As both Jens and I said, it's because GCC dumps a whole load of metadata into the binaries it builds, including the exact source filename (and AFAICS so does clang).

Try this:

$ cp code.c code2.c subdir/code.c $ gcc code.c -o a $ gcc code2.c -o b $ gcc subdir/code.c -o a2 $ diff a b Binary files a and b differ $ diff a2 b Binary files a2 and b differ $ diff -s a a2 Files a and a2 are identical 

This explains why your md5sums don't change between builds, but they are different between different files. If you want, you can do what Jens suggested and compare the output of strings for each binary you'll notice that the filenames are embedded in the binary. If you want to "fix" this, you can strip the binaries and the metadata will be removed:

$ strip a a2 b $ diff -s a b Files a and b are identical $ diff -s a2 b Files a2 and b are identical $ diff -s a a2 Files a and a2 are identical 
like image 89
cyphar Avatar answered Oct 22 '22 04:10

cyphar


The most common reason are file names and time stamps added by the compiler (usually in the debug info part of the ELF sections).

Try running

 $ strings -a program > x  ...recompile program...  $ strings -a program > y  $ diff x y 

and you might see the reason. I once used this to find why the same source would cause different code when compiled in different directories. The finding was that the __FILE__ macro expanded to an absolute file name, different in both trees.

like image 45
Jens Avatar answered Oct 22 '22 06:10

Jens