For an assignment of a course called High Performance Computing, I required to optimize the following code fragment:
int foobar(int a, int b, int N) { int i, j, k, x, y; x = 0; y = 0; k = 256; for (i = 0; i <= N; i++) { for (j = i + 1; j <= N; j++) { x = x + 4*(2*i+j)*(i+2*k); if (i > j){ y = y + 8*(i-j); }else{ y = y + 8*(j-i); } } } return x; }
Using some recommendations, I managed to optimize the code (or at least I think so), such as:
Here's my code:
int foobar(int a, int b, int N) { int i, j, x, y, t; x = 0; y = 0; for (i = 0; i <= N; i++) { t = i + 512; for (j = i + 1; j <= N; j++) { x = x + ((i<<3) + (j<<2))*t; } } return x; }
According to my instructor, a well optimized code instructions should have fewer or less costly instructions in assembly language level.And therefore must be run, the instructions in less time than the original code, ie calculations are made with::
execution time = instruction count * cycles per instruction
When I generate assembly code using the command: gcc -o code_opt.s -S foobar.c
,
the generated code has many more lines than the original despite having made some optimizations, and run-time is lower, but not as much as in the original code. What am I doing wrong?
Do not paste the assembly code as both are very extensive. So I'm calling the function "foobar" in the main and I am measuring the execution time using the time command in linux
int main () { int a,b,N; scanf ("%d %d %d",&a,&b,&N); printf ("%d\n",foobar (a,b,N)); return 0; }
Embedded C - Optimization techniques. Opens in a new window Opens an external website Opens an external website in a new window. This website stores data such as cookies to enable essential site functionality, as well as marketing, personalization, and analytics.
We say that code optimization is writing or rewriting code so a program uses the least possible memory or disk space, minimizes its CPU time or network bandwidth, or makes the best use of additional cores. In practice, we sometimes default to another definition: Writing less code.
Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources or executes faster.
Initially:
for (i = 0; i <= N; i++) { for (j = i + 1; j <= N; j++) { x = x + 4*(2*i+j)*(i+2*k); if (i > j){ y = y + 8*(i-j); }else{ y = y + 8*(j-i); } } }
Removing y
calculations:
for (i = 0; i <= N; i++) { for (j = i + 1; j <= N; j++) { x = x + 4*(2*i+j)*(i+2*k); } }
Splitting i
, j
, k
:
for (i = 0; i <= N; i++) { for (j = i + 1; j <= N; j++) { x = x + 8*i*i + 16*i*k ; // multiple of 1 (no j) x = x + (4*i + 8*k)*j ; // multiple of j } }
Moving them externally (and removing the loop that runs N-i
times):
for (i = 0; i <= N; i++) { x = x + (8*i*i + 16*i*k) * (N-i) ; x = x + (4*i + 8*k) * ((N*N+N)/2 - (i*i+i)/2) ; }
Rewritting:
for (i = 0; i <= N; i++) { x = x + ( 8*k*(N*N+N)/2 ) ; x = x + i * ( 16*k*N + 4*(N*N+N)/2 + 8*k*(-1/2) ) ; x = x + i*i * ( 8*N + 16*k*(-1) + 4*(-1/2) + 8*k*(-1/2) ); x = x + i*i*i * ( 8*(-1) + 4*(-1/2) ) ; }
Rewritting - recalculating:
for (i = 0; i <= N; i++) { x = x + 4*k*(N*N+N) ; // multiple of 1 x = x + i * ( 16*k*N + 2*(N*N+N) - 4*k ) ; // multiple of i x = x + i*i * ( 8*N - 20*k - 2 ) ; // multiple of i^2 x = x + i*i*i * ( -10 ) ; // multiple of i^3 }
Another move to external (and removal of the i loop):
x = x + ( 4*k*(N*N+N) ) * (N+1) ; x = x + ( 16*k*N + 2*(N*N+N) - 4*k ) * ((N*(N+1))/2) ; x = x + ( 8*N - 20*k - 2 ) * ((N*(N+1)*(2*N+1))/6); x = x + (-10) * ((N*N*(N+1)*(N+1))/4) ;
Both the above loop removals use the summation formulas:
Sum(1, i = 0..n) = n+1
Sum(i1, i = 0..n) = n(n + 1)/2
Sum(i2, i = 0..n) = n(n + 1)(2n + 1)/6
Sum(i3, i = 0..n) = n2(n + 1)2/4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With