Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC/g++ cout << vs. printf()

Tags:

c++

c

gcc

assembly

g++

  • Why does printf("hello world") ends up using more CPU instructions in the assembled code (not considering the standard library used) than cout << "hello world"?

For C++ we have:

movl    $.LC0, %esi
movl    $_ZSt4cout, %edi
call    _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc

For C:

movl    $.LC0, %eax
movq    %rax, %rdi
movl    $0, %eax
call    printf
  • WHAT are line 2 from the C++ code and lines 2,3 from the C code for?

I'm using gcc version 4.5.2

like image 911
Flavius Avatar asked Mar 06 '11 16:03

Flavius


3 Answers

For 64bit gcc -O3 (4.5.0) on Linux x86_64, this reads for: cout << "Hello World"

movl    $11, %edx         ; String length in EDX
movl    $.LC0, %esi       ; String pointer in ESI
movl    $_ZSt4cout, %edi  ; load virtual table entry of "cout" for "ostream"
call    _ZSt16__ostream_insertIcSt11char_traits...basic_ostreamIT_T0_ES6_PKS3_l

and, for printf("Hello World")

movl    $.LC0, %edi       ; String pointer to EDI
xorl    %eax, %eax        ; clear EAX (maybe flag for printf=>no stack arguments)
call    printf

which means, your sequence depends entirely on any specific compiler implementation, its version and probably compiler options. Your Edit states,you use gcc 4.5.2 (which is fairly new). Seems like 4.5.2 introduces additional 64bit register fiddling in this sequence for whatever reason. It saves the 64bit RAX to RDI before zeroing it out - which makes absolutely no sense (at least for me).

Much more interesting: 3 Argument call sequence (g++ -O1 -S source.cpp):

 void c_proc()
{
 printf("%s %s %s", "Hello", "World", "!") ;
}

 void cpp_proc()
{
 std::cout << "Hello " << "World " << "!";
}

leads to (c_proc):

movl    $.LC0, %ecx
movl    $.LC1, %edx
movl    $.LC2, %esi
movl    $.LC3, %edi
movl    $0, %eax
call    printf

with .LCx being the strings, and no stack pointer involved!

For cpp_proc:

movl    $6, %edx
movl    $.LC4, %esi
movl    $_ZSt4cout, %edi
call    _ZSt16__ostream_insertIcSt11char_traits...basic_ostreamIT_T0_ES6_PKS3_l
movl    $6, %edx
movl    $.LC5, %esi
movl    $_ZSt4cout, %edi
call    _ZSt16__ostream_insertIcSt11char_traits...basic_ostreamIT_T0_ES6_PKS3_l
movl    $1, %edx
movl    $.LC0, %esi
movl    $_ZSt4cout, %edi
call    _ZSt16__ostream_insertIcSt11char_traits...basic_ostreamIT_T0_ES6_PKS3_l

You see now what this is all about.

Regards

rbo

like image 159
rubber boots Avatar answered Sep 28 '22 10:09

rubber boots


The caller code is most of the time irrelevant to performance.

I guess the line 2 of the C++ code stores the address of std::cout as the implicit 'this' argument of the operator<< method.

and i might be wrong on the C part, but it seems to me that it is incomplete. the 32bit upper part of rax is not initialized in this snippet, it might be initialized earlier. (no, i'm wrong here).

from what i understand (i might be wrong), the problem with 64bit registers, is that most of the time they cannot be initialized by immediates, so you have to play with 32bit operations to get the desired result. so the compiler plays with 32bit registers to initialize the 64bit rdi register.

And it seems that printf takes the value of al (the LSB of eax) as an input that tells printf() how many xmm 128 registers are used as input. It looks like an optimization to be able to pass the input string into the xmm registers or some other funny business.

like image 42
BatchyX Avatar answered Sep 28 '22 10:09

BatchyX


int printf( const char*, ...) is a variadic function that can take one or more arguments; whereas ostream& operator<< (ostream&, signed char*) takes exactly two. I believe that that accounts for the difference in instructions needed to invoke them.

Line 2 in the C++ disassembly is where it passes the ostream& (in this case cout). so the function knows what stream object it is outputting to.

Since both end up making a function call, the comparison is largely irrelevant; the code executed within the function call will be far more significant. The operator<< is overloaded for a number of right-hand-side types, and is resolved at compile time; printf() on the other hand must parse the format string at runtime to determine the data type so may incur additional overhead. Either way the amount of code executed within the functions will swamp the call overhead in terms of instructions executed, and will almost certainly be dominated by the OS code required to render the text on a graphical display. So in short you are sweating the small stuff.

like image 21
Clifford Avatar answered Sep 28 '22 10:09

Clifford