I have a question about a Project Euler question and optimization using loop unrolling.
Problem description: 2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder. What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?
Solution:
#include <iostream>
#include <limits.h>
#include <stdio.h>
#include <time.h>
using namespace std;
int main() {
clock_t startTime = clock();
for (int i = 1; i < INT_MAX; i++)
{
bool isDivisible = true;
//CODE BLOCK #1
/*for (int j = 2; j <= 20; j++)
{
if ( i % j != 0)
{
isDivisible = false;
break;
{
}*/
//CODE BLOCK #2
/*if (i % 2 != 0 || i % 3 != 0 ||
i % 4 != 0 || i % 5 != 0 ||
i % 6 != 0 || i % 7 != 0 ||
i % 8 != 0 || i % 9 != 0 ||
i % 10 != 0 || i % 11 != 0 ||
i % 12 != 0 || i % 13 != 0 ||
i % 14 != 0 || i % 15 != 0 ||
i % 16 != 0 || i % 17 != 0 ||
i % 18 != 0 || i % 19 != 0 ||
i % 20 != 0 )
isDivisible = false;*/
if (isDivisible)
{
cout << "smallest: " << i << endl;
cout << "Ran in: " << clock() - startTime << " cycles" << endl;
break;
}
}
return 0;
}
Now, commenting out either CODE BLOCK #1 or CODE BLOCK #2 gives me the correct answer (232792560). However, CODE BLOCK #2 much faster than CODE BLOCK #1.
CODE BLOCK #1: 3,580,000 cycles (I just added the break into CODE BLOCK #1 and it runs much faster. Still significantly slower than the compound IF statement, however.)
CODE BLOCK #2: 970,000 cycles
Does anyone know why this huge performance difference might occur?
Using the ||
means that as soon as one is true, the rest of the conditions are not computed. This would be equivalent to the loop:
for (int j = 2; j <= 20; j++)
{
if ( i % j != 0){
isDivisible = false;
break;
}
}
If you try this you may find the gap in running times has been narrowed. Any other differences could be attributed to loop overhead, but with optimizations turned on in your compiler I suspect that they would run at the same speed (or at least have much more similar times).
EDIT About the new performance difference:
There are many optimized ways to check divisibility of numbers by constants, for example for N
any power of 2 i % N != 0
can be replaced with i & (N-1)
, others exist as well and are not as obvious.
The compiler knows lots of these little tricks and in the second code block is probably able to optimize most if not all of those divisibility checks (since they are written out directly by you) while in the first code block it has to decide to unroll the loops first and then substitute the loop variable with constants before it's even possible to reason about the different checks.
It's possible this difference is making for better optimized code in block 2 than in block 1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With