Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this program not optimized away?

Consider the following, simple program (adapted from this question):

#include <cstdlib>

int main(int argc, char** argv) {
    int mul1[10] = { 4, 1, 8, 6, 3, 2, 5, 8, 6, 7 }; // sum = 50
    int mul2[10] = { 4, 1, 8, 6, 7, 9, 5, 1, 2, 3 }; // sum = 46

    int x1 = std::atoi(argv[1]);
    int x2 = std::atoi(argv[2]);

    int result = 0;

    // For each element in mul1/mul2, accumulate the product with x1/x2 in result
    for (int i = 0; i < 10; ++i) {
        result += x1 * mul1[i] + x2 * mul2[i];
    }

    return result;
}

I believe it is functionally equivalent to the following one:

#include <cstdlib>

int main(int argc, char** argv) {
    int x1 = std::atoi(argv[1]);
    int x2 = std::atoi(argv[2]);

    return x1 * 50 + x2 * 46;
}

And yet clang 3.7.1, gcc 5.3 and icc 13.0.1 seem to be unable to make such optimization, even with -Ofast. (Note by the way how the generated assembly is vastly different between compilers!). However, when removing mul2 and x2 from the equation, clang is able to perform a similar optimization, even with -O2.

What prevents both compilers from optimizing the first program into the second?

like image 941
user703016 Avatar asked Jun 10 '15 08:06

user703016


2 Answers

The complete expression is too complicated even for clang. If you split it then everything gets optimized again:

int result1 = 0;
int result2 = 0;

for (int i = 0; i < 10; ++i) {
    result1 += x1 * mul1[i];
    result2 += x2 * mul2[i];
}

std::cout << (result1 + result2);
like image 194
dlask Avatar answered Sep 21 '22 14:09

dlask


I'm not a compiler programmer so this can only be a guess. IMHO, the answer is part in @dlask's answer and part in the remark that clang does the optimisation when you remove x2 and mul2 from the expression.

The compiler may optimize away all that it can do. But I also think that optimizing compilers are already huge and complicated programs, in which bugs can have high consequences because they are at the base of almost everything else (Python most current interpreter is written in ... C)

So there must be a balance between the efficiency of the optimizer and its complexity, and I think that this example program is beyond it for gcc, and just around for clang. Nothing prevents them to do that optimization except that it is just too complicated for current version of those compilers.

like image 29
Serge Ballesta Avatar answered Sep 20 '22 14:09

Serge Ballesta