How compilers work. Compilers are utility programs that take your code and transform it into executable machine code files. When you run a compiler on your code, first, the preprocessor reads the source code (the C++ file you just wrote).
No, compilers are made by humans and are no smarter nor better than humans. They are at best more consistent and certainly much faster while remaining more consistent.
Writing a compiler requires knowledge of a lot of areas of computer science - regular expressions, context-free grammars, syntax trees, graphs, etc. It can help you see how to apply the theory of computer science to real-world problems.
They are written in their host language for the simple reason that they need to interact with the operating system to perform operations they do cannot do on their own, they would do so using the operating system provided API. The C++ Standard library is written in C++ because most of its implementation uses templates.
In my opinion, I don't believe it's the job of the compiler to fix what is, honestly, bad coding. You have, quite explicitly, told the compiler you want that first loop executed. It's the same as:
x = 0
sleep 6 // Let's assume this is defined somewhere.
print x
I wouldn't want the compiler removing my sleep
statement just because it did nothing. You may argue that the sleep statement is an explicit request for a delay whereas your example is not. But then you will be allowing the compiler to make very high-level decisions about what your code should do, and I believe that to be a bad thing.
Code, and the compiler that processes it, are tools and you need to be a tool-smith if you want to use them effectively. How many 12" chainsaws will refuse to try cut down a 30" tree? How many drills will automatically switch to hammer mode if they detect a concrete wall?
None, I suspect, and this is because the cost of designing this into the product would be horrendous for a start. But, more importantly, you shouldn't be using drills or chainsaws if you don't know what you're doing. For example: if you don't know what kickback is (a very easy way for a newbie to take off their arm), stay away from chainsaws until you do.
I'm all for allowing compilers to suggest improvements but I'd rather maintain the control myself. It should not be up to the compiler to decide unilaterally that a loop is unnecessary.
For example, I've done timing loops in embedded systems where the clock speed of the CPU is known exactly but no reliable timing device is available. In that case, you can calculate precisely how long a given loop will take and use that to control how often things happen. That wouldn't work if the compiler (or assembler in that case) decided my loop was useless and optimized it out of existence.
Having said that, let me leave you with an old story of a VAX FORTRAN compiler that was undergoing a benchmark for performance and it was found that it was many orders of magnitude faster than its nearest competitor.
It turns out the compiler noticed that the result of the benchmark loops weren't being used anywhere else and optimized the loops into oblivion.
Oh, I don't know. Sometimes compilers are pretty smart. Consider the following C program:
#include <stdio.h> /* printf() */
int factorial(int n) {
return n == 0 ? 1 : n * factorial(n - 1);
}
int main() {
int n = 10;
printf("factorial(%d) = %d\n", n, factorial(n));
return 0;
}
On my version of GCC (4.3.2 on Debian testing), when compiled with no optimizations, or -O1
, it generates code for factorial()
like you'd expect, using a recursive call to compute the value. But on -O2
, it does something interesting: It compiles down to a tight loop:
factorial:
.LFB13:
testl %edi, %edi
movl $1, %eax
je .L3
.p2align 4,,10
.p2align 3
.L4:
imull %edi, %eax
subl $1, %edi
jne .L4
.L3:
rep
ret
Pretty impressive. The recursive call (not even tail-recursive) has been completely eliminated, so factorial now uses O(1) stack space instead of O(N). And although I have only very superficial knowledge of x86 assembly (actually AMD64 in this case, but I don't think any of the AMD64 extensions are being used above), I doubt that you could write a better version by hand. But what really blew my mind was the code that it generated on -O3
. The implementation of factorial stayed the same. But main()
changed:
main:
.LFB14:
subq $8, %rsp
.LCFI0:
movl $3628800, %edx
movl $10, %esi
movl $.LC0, %edi
xorl %eax, %eax
call printf
xorl %eax, %eax
addq $8, %rsp
ret
See the movl $3628800, %edx
line? gcc is pre-computing factorial(10)
at compile-time. It doesn't even call factorial()
. Incredible. My hat is off to the GCC development team.
Of course, all the usual disclaimers apply, this is just a toy example, premature optimization is the root of all evil, etc, etc, but it illustrates that compilers are often smarter than you think. If you think you can do a better job by hand, you're almost certainly wrong.
(Adapted from a posting on my blog.)
Speaking from a C/C++ point of view:
Your first example will be optimized by most compilers. If the java-compiler from Sun really executes this loop it's the compilers fault, but take my word that any post 1990 C, C++ or Fortran-compiler completely eliminates such a loop.
Your second example can't be optimized in most languages because memory allocation happens as a side-effect of concatenating the strings together. If a compiler would optimize the code the pattern of memory allocation would change, and this could lead to effects that the programmer tries to avoid. Memory fragmentation and related problems are issues that embedded programmers still face every day.
Overall I'm satisfied with the optimizations compilers can do these days.
Compilers are designed to be predictable. This may make them look stupid from time to time, but that's OK. The compiler writer's goals are
You should be able to look at your code and make reasonable predictions about its performance.
Small changes in the code should not result in dramatic differences in performance.
If a small change looks to the programmer like it should improve performance, it should at least not degrade performance (unless surprising things are happening in the hardware).
All these criteria militate against "magic" optimizations that apply only to corner cases.
Both of your examples have a variable updated in a loop but not used elsewhere. This case is actually quite difficult to pick up unless you are using some sort of framework that can combine dead-code elimination with other optimizations like copy propagation or constant propagation. To a simple dataflow optimizer the variable doesn't look dead. To understand why this problem is hard, see the paper by Lerner, Grove, and Chambers in POPL 2002, which uses this very example and explains why it is hard.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With