I converted a C code to MIPS and I could not understand a part of the MIPS instructions:
#include <inttypes.h>
#include <stdint.h>
uint16_t
chksum(uint16_t sum, const uint8_t *data, uint16_t len)
{
uint16_t t;
const uint8_t *dataptr;
const uint8_t *last_byte;
dataptr = data;
last_byte = data + len - 1;
while (dataptr < last_byte)
{
t = (dataptr[0] << 8) + dataptr[1];
sum += t;
if (sum < t)
{
sum++;
}
dataptr += 2;
}
if (dataptr == last_byte)
{
t = (dataptr[0] << 8) + 0;
sum += t;
if (sum < t)
{
sum++;
}
}
return sum;
}
I used MIPS gcc5.4 on the Godbolt compiler explorer with -O2
optimization, with the default -march
of classic MIPS1 that doesn't have load interlocks:
chksum(unsigned short, unsigned char const*, unsigned short):
andi $6,$6,0xffff
addiu $6,$6,-1
addu $6,$5,$6
sltu $3,$5,$6
beq $3,$0,$L2
andi $2,$4,0xffff
move $4,$5
$L4:
lbu $3,0($4)
lbu $7,1($4)
sll $3,$3,8
addu $3,$3,$7
andi $3,$3,0xffff
addu $2,$3,$2
andi $2,$2,0xffff
addiu $4,$4,2
sltu $3,$2,$3
sltu $7,$4,$6
beq $3,$0,$L3
addiu $8,$2,1
andi $2,$8,0xffff
$L3:
bne $7,$0,$L4
nor $3,$0,$5
addu $3,$3,$6
srl $3,$3,1
addiu $3,$3,1
sll $3,$3,1
addu $5,$5,$3
$L2:
beq $6,$5,$L8
nop
$L9:
j $31
nop
$L8:
lbu $3,0($6)
nop
sll $3,$3,8
addu $2,$3,$2
andi $2,$2,0xffff
sltu $3,$2,$3
beq $3,$0,$L9
nop
addiu $2,$2,1
j $31
andi $2,$2,0xffff
I matched most of the instructions with the code but I could not understand the part in $L3
starting with nor
instruction until the addu
before $L2
.
The Compiler Explorer shows that part is related to the while
but I do not get why it manipulates the $5
before the branch in $L2
.
Let's analyze what the code is doing. A few mappings to make the code easy to follow:
Initial parameters:
$4: sum parameter
$5: data parameter
$6: len parameter
Labels:
$L4: while body
$L3: while condition
$L2: if condition
Registers:
$2: sum
$4: dataptr
$6: last_byte
The relevant code:
// [...]
sltu $3,$5,$6 // $3 = $5 (data parameter) < $6 (last_byte) ? 1 : 0
beq $3,$0,$L2 // if $3 == 0 goto $L2 (if condition)
andi $2,$4,0xffff // $2 (sum) = $4 (sum parameter) & 0xFFFF
move $4,$5 // $4 (dataptr) = $5 (data parameter)
$L4: // while body
// [...]
sltu $7,$4,$6 // $7 = $4 (dataptr) < $6 (last_byte) ? 1 : 0
// [...]
$L3: // while condition
bne $7,$0,$L4 // if $7 != 0 goto $L4 (while body) [1]
nor $3,$0,$5 // $3 = $5 (data) nor 0
addu $3,$3,$6 // $3 += $6 (last_byte)
srl $3,$3,1 // $3 >>= 1
addiu $3,$3,1 // $3++
sll $3,$3,1 // $3 <<= 1
addu $5,$5,$3 // $5 += $3
$L2: // if condition
beq $6,$5,$L8 // if $6 (last_byte) == $5 goto $L8 [2]
The while
loop finishes at [1]
. The rest of the instructions until [2]
compute a value into register $5
for comparison with $6
(last_byte
),
which is the if
in the source code.
The question here is: what is the value in $5
? If you put together all the operations, you get:
$5 = $5 + ((((($5 nor 0) + $6) >> 1) + 1) << 1)
Let's unravel this expression. First, realize that:
x NOR 0 = NOT(x OR 0) = ~(x | 0) = ~x
So it is just negating (one's complement) on $5
.
Then, it adds $6
, which is last_byte
.
The next 3 operations (>> 1
, + 1
, << 1
) are a way to compute the next even integer. See what happens with a few cases:
0000 (0) -> 0010 (2)
0001 (1) -> 0010 (2)
0010 (2) -> 0100 (4)
0011 (3) -> 0100 (4)
0100 (4) -> 0110 (6)
0101 (5) -> 0110 (6)
0110 (6) -> 1000 (8)
0111 (7) -> 1000 (8)
Finally, it is adding the original value of $5
, which was the data
parameter.
If you put all together, and replace with the names of the C variables for clarity, you arrive at:
$5 = data + next_even(~data + last_byte)
Recall that, for two's complement integers:
x - y == x + ~y + 1
Therefore:
$5 = data + next_even(last_byte - data - 1)
= data + next_even(len - 2)
Now, computing the next even value after subtracting 2
is basically removing the lowest bit of information; in other words, a "floor" to the even numbers. This can be expressed as returning the same number if it is even, or one less if it is odd, i.e.:
$5 = data + (len % 2 ? len : len - 1)
Finally, the compiler compares this register to $6
(last_byte
). Simplifying:
last_byte == data + (len % 2 ? len : len - 1)
data + len - 1 == data + (len % 2 ? len : len - 1)
len - 1 == len % 2 ? len : len - 1
len % 2 != 0
Now we can also see that the expression actually only depends on len
, and not on data
.
The compiler, with all those instructions, is effectively re-computing dataptr
from data
and last_bytes
. Indeed, if you consider that dataptr
is only ever advanced from data
in increments of 2
, we can rewrite it as:
data + 2 * n_increments
data + 2 * (len / 2)
data + (len % 2 ? len : len - 1)
Which is precisely the value of $5
computed above.
Knowing this, one can wonder why the compiler arrived at this solution. The same happens for a recent version of GCC (8.1.0) and x86-64:
mov rdx, rsi
not rdx
add rdx, r8
shr rdx
lea rsi, [rsi+2+rdx*2]
It is clear that the optimizer is realizing that the final value of dataptr
can be computed independently of the while
loop -- however, it is unclear why it decides to do so instead of picking the value from the register. Maybe it has decided that avoiding the dependency on the result of the loop is faster (due to instruction pipelining) than otherwise.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With