I read a little about hoisting and reordering, so it seems that Java VM may choose to hoist some expressions. I also read about hoisting of function declarations in Javascript.
First Question: Can someone confirm if hoisting usually exist in C, C++ and Java? or are they all compiler/optimization dependent?
I read a lot of example C codes that always put variable declarations on top, before any assert or boundary condition. I thought it would be a little faster to do all the asserts and boundary cases before variable declarations given that the function could just terminate.
Main Question: Must variable declarations always be on top in a context? (is there hoisting at work here?) Or does the compiler automatically optimize the code by checking these independent asserts and boundary cases first (before irrelevant variable declaration)?
Here's a related example:
void MergeSort(struct node** headRef) {
struct node* a;
struct node* b;
if ((*headRef == NULL) || ((*headRef)->next == NULL)) {
return;
}
FrontBackSplit(*headRef, &a, &b);
MergeSort(&a);
MergeSort(&b);
*headRef = SortedMerge(a, b);
}
As shown above, the boundary case does not depend on variables "a" and "b". Thus, putting the boundary case above variable declarations would make it slightly faster?
Updates:
The above example isn't as good as I hoped because variables "a" and "b" were only declared, not initialized there. Compiler would ignore declaration until we actually need to use them.
I checked GNU GCC assemblies for variable declarations with initializations, the assemblies have different execution sequence. Compiler did not change my ordering of independent asserts and boundary cases. So, reordering these asserts and boundary cases do change the assemblies, thus changing how machine runs them.
I suppose the difference is minuscule that most people never cared about this.
The compiler may reorder/modify your code as it wishes, as long as the modified code is equivalent to the original if executed sequentially. So hoisting is allowed, but not required. This is an optimization and it is completely compiler specific.
Variable declarations in C++ can be wherever you wish. In C they used to have to be on top in a context, but when the c99 standard was introduced, the rules were relaxed and now they can be wherever you want, similarly to c++. Still, many c programmers stick to putting them on top in a context.
In your example, the compiler is free to move the if statements to the top, but I don't think it would. These variables are just pointers that are declared on stack and are un-initialized, the cost of declaring them is minimal, moreover it might be more efficient to create them at the beginning of the function, rather than after the asserts.
If your declarations would involve any side-effects, for example
struct node *a = some_function();
then compiler would be limited in what it can reorder.
Edit:
I checked GCC's loop hoisting in practice with this short program:
#include <stdio.h>
int main(int argc, char **argv) {
int dummy = 2 * argc;
int i = 1;
while (i<=10 && dummy != 4)
printf("%d\n", i++);
return 0;
}
I've compiled it with this command:
gcc -std=c99 -pedantic test.c -S -o test.asm
This is the output:
.file "test.c"
.def ___main; .scl 2; .type 32; .endef
.section .rdata,"dr"
LC0:
.ascii "%d\12\0"
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB7:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $32, %esp
call ___main
movl 8(%ebp), %eax
addl %eax, %eax
movl %eax, 24(%esp)
movl $1, 28(%esp)
jmp L2
L4:
movl 28(%esp), %eax
leal 1(%eax), %edx
movl %edx, 28(%esp)
movl %eax, 4(%esp)
movl $LC0, (%esp)
call _printf
L2:
cmpl $10, 28(%esp)
jg L3
cmpl $4, 24(%esp)
jne L4
L3:
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE7:
.ident "GCC: (GNU) 4.8.2"
.def _printf; .scl 2; .type 32; .endef
Then I've compiled it with this command:
gcc -std=c99 -pedantic test.c -O3 -S -o test.asm
This is the output:
.file "test.c"
.def ___main; .scl 2; .type 32; .endef
.section .rdata,"dr"
LC0:
.ascii "%d\12\0"
.section .text.startup,"x"
.p2align 4,,15
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB7:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
pushl %ebx
andl $-16, %esp
subl $16, %esp
.cfi_offset 3, -12
call ___main
movl 8(%ebp), %eax
leal (%eax,%eax), %edx
movl $1, %eax
cmpl $4, %edx
jne L8
jmp L6
.p2align 4,,7
L12:
movl %ebx, %eax
L8:
leal 1(%eax), %ebx
movl %eax, 4(%esp)
movl $LC0, (%esp)
call _printf
cmpl $11, %ebx
jne L12
L6:
xorl %eax, %eax
movl -4(%ebp), %ebx
leave
.cfi_restore 5
.cfi_restore 3
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE7:
.ident "GCC: (GNU) 4.8.2"
.def _printf; .scl 2; .type 32; .endef
So basically, with optimization turned on the original code was transformed to something like this:
#include <stdio.h>
int main(int argc, char **argv) {
int dummy = 2 * argc;
int i = 1;
if (dummy != 4)
while (i<=10)
printf("%d\n", i++);
return 0;
}
So, as you can see, there is indeed hoisting in C.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With