Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimize this assembly code

I'm taking a Computer Architecture course right now, and we're going over basic R-type and I-type instructions (also, this is a RISC architecture), etc. I can't seem to figure out how to optimize this code.

Explanation: This code adds words in an array (pointed to by $s1) of numbers until a zero is reached. The result is stored in $t1. $t0 holds the current word.

        add   $t1, $zero, $zero      # Initialize result to zero
again:  
        lw    $t0, 0($s1)            # Load the word from the array
        beq   $t0, $zero, done       # Terminate if current word is a zero
        add   $t1, $t1, $t0          # Add current word to result
        addi  $s1, $s1, 4            # Point to the next word in the array
        beq   $t1, $t1, again        # Loop again
done:
        nop                          # Do nothing

I'm having a difficult time optimizing the code. I feel the beq $t1, $t1, again (since it's always true) is unnecessary, but I'm not sure how to remove it. Here's my attempt, but I now realize that my code would not terminate.

        add   $t1, $zero, $zero      # Initialize result to zero
again:  
        lw    $t0, 0($s1)            # Load the word from the array
        add   $t1, $t1, $t0          # Add current word to result
        addi  $s1, $s1, 4            # Point to the next word in the array
        bne   $t1, $zero, again      # If result is not zero, loop
done:
        nop                          # Do nothing

I'm never checking for a terminating zero and jumping to done. But if I add another check, then wouldn't the code be the same as before?

like image 678
ardavis Avatar asked Jan 26 '12 19:01

ardavis


People also ask

Can assembly be optimized?

There are many different types of code optimization when it comes to assembly or assembler code. There is of course most popular speed optimization that focuses on the fastest possible code, often with the use of MMX, SSE, AVX instructions to process as much data as possible.

How do you optimize a program code?

Optimize Program Algorithm For any code, you should always allocate some time to think the right algorithm to use. So, the first task is to select and improve the algorithm which will be frequently used in the code. 2. Avoid Type Conversion Whenever possible, plan to use the same type of variables for processing.

Do assemblers optimize code?

Assembler does no optimization. It takes your code as is and converts to machine code. Compilers on the other hand can optimize your code; the resulting assembly is already optimized.

What is optimize the code?

Code optimization is any method of code modification to improve code quality and efficiency. A program may be optimized so that it becomes a smaller size, consumes less memory, executes more rapidly, or performs fewer input/output operations.


1 Answers

Typically you want to convert a test at the top loop into a test at the bottom loop. To do that, you frequently have to jump into (more or less) the middle of the loop body for the first iteration. In pseudo code, what you have right now is basically:

    initialize sum
beginning:
    load a word
    if (done) goto end
    add to sum
    increment pointer
    goto beginning
end:

to optimize that, we want to change the structure to something like this:

    initialize sum
    goto start_loop

beginning:
    add to sum
    increment pointer
start_loop:
    load a word
    if (!done) goto beginning

This way there's only one jump per loop instead of two (and it's a short backward jump, so the target will nearly always be in the cache).

That said, I should add that this optimization is really mostly obsolete -- with decent branch prediction, an unconditional jump is usually essentially free.

Edit: since loop unrolling has been mentioned, I'll add my two cents about it. Branch prediction generally renders loop unrolling obsolete, unless you can use it to execute more instructions in parallel. That's not an issue here, but often useful in real life. For example, if we assume a CPU with two separate adders available, we can unroll two iterations of the loop, and add the results from those iterations into two separate target registers, so we take advantage of both adders by adding two values at the same time. Then, when the loop terminates, we add together those two registers to get the final value. In C-like psuedo-code, that would come out something like this:

odds = 0
evens = 0

do {   
    evens += pointer[0];
    odds += pointer[1];
    pointer += 2;
while (pointer[0] && pointer[1]);
total = odds + evens;

As written, this adds the minor extra requirements for two consecutive zeros to terminate the sequence instead of just one, and a minimum of two items in the array to be added. Note, however, that it's not really the loop unrolling that gives the major advantage here, but the parallel execution.

Absent that, we really only see a benefit from loop unrolling if a branch that's not taken is cheaper than a branch that is taken (even if both are predicted correctly). On older processors (e.g., older Intels) taking a branch did carry a penalty relative to a non-taken branch. At the same time, the unrolled loop will use more cache space. Thus, on a modern processor, it's frequently an overall loss (unless, as I said before, we can use the unrolling to gain parallelism).

like image 155
Jerry Coffin Avatar answered Oct 13 '22 11:10

Jerry Coffin