The following code computes the product of x and y and stores the result in memory. Data type ll_t is defined to be equivalent to long long.
typedef long long ll_t;
void store_prod(ll_t *dest, int x, ll_t y) {
*dest = x*y;
}
gcc generates the following assembly code implementing the computation: dest at %ebp+8, x at %ebp+12, y at %ebp+16
1 movl 16(%ebp), %esi
2 movl 12(%ebp), %eax
3 movl %eax, %edx
4 sarl $31, %edx
5 movl 20(%ebp), %ecx
6 imull %eax, %ecx
7 movl %edx, %ebx
8 imull %esi, %ebx
9 addl %ebx, %ecx
10 mull %esi
11 leal (%ecx,%edx), %edx
12 movl 8(%ebp), %ecx
13 movl %eax, (%ecx)
14 movl %edx, 4(%ecx)
This code uses three multiplications to implement the multiprecision arithmetic required to implement 64-bit arithmetic on a 32-bit machine. Describe the algorithm used to compute the product, and annotate the assembly code to show how it realizes your algorithm.
I don't understand line 8 and line 9 in assembly code above. Can anyone help?
I've converted it to intel syntax.
mov esi, y_low
mov eax, x
mov edx, eax
sar edx, 31
mov ecx, y_high
imul ecx, eax ; ecx = y_high *{signed} x
mov ebx, edx
imul ebx, esi ; ebx = sign_extension(x) *{signed} y_low
add ecx, ebx ; ecx = y_high *{signed} x_low + x_high *{signed} y_low
mul esi ; edx:eax = x_low *{unsigned} y_low
lea edx, [ecx + edx] ; edx = high(x_low *{unsigned} y_low + y_high *{signed} x_low + x_high *{signed} y_low)
mov ecx, dest
mov [ecx], eax
mov [ecx + 4], edx
What the above code does is multiplication of 2 64-bit signed integers that keeps the least-significant 64 bits of the product.
Where does the other 64-bit multiplicand come from? It's x
sign-extended from 32 bits to 64. The sar
instruction is used to replicate x's
sign bit into all bits of edx
. I call this value consisting only of the x's
sign x_high
. x_low
is the value of x
actually passed into the routine.
y_low
and y_high
are the least and most significant parts of y
, just like x's
x_low
and x_high
are.
From here it's pretty easy:
product = y
*{signed} x
=
(y_high
* 232 + y_low
) *{signed} (x_high
* 232 + x_low
) =y_high
*{signed} x_high
* 264 +y_high
*{signed} x_low
* 232 +y_low
*{signed} x_high
* 232 +y_low
*{signed} x_low
y_high
*{signed} x_high
* 264 isn't calculated because it doesn't contribute to the least significant 64 bits of the product. We'd calculate it if we were interested in the full 128-bit product (full 96-bit product for the picky).
y_low
*{signed} x_low
is calculated using unsigned multiplication. It's legal to do so because 2's complement signed multiplication gives the same least significant bits as unsigned multiplication. Example:
-1 *{signed} -1 = 1
0xFFFFFFFFFFFFFFFF *{unsigned} 0xFFFFFFFFFFFFFFFF = 0xFFFFFFFFFFFFFFFE0000000000000001 (64 least significant bits are equivalent to 1)
Consider the context of line 8 and 9.
By this time, ESI
contains the lower half of y
and EBX
contains sgn(x)
. So line 8 is just computing sgn(x) * (y % 2^32)
and storing it in EBX
.
Line 9 draws upon that result. By the time Line 9 happens, ECX
contains a partial upper half of the multiplication, that is, x * (y >> 32)
signed. So EBX+ECX
ends up being what we computed in the last step plus the partial upper half we found on a previous line.
The full algorithm itself is pretty neat ;)
EDIT: In response to a comment below...
Line 4: Consider what SAR EDX, 31
(or if you like, sar $31, %edx
) really means. Since EDX
is a 32-bit register, you'll end up with one of two values. Which two? Consider what they mean in the context of signed arithmetic.
Line 7: EDX
by this point contains something pretty useful for the following operations. I'm just moving it where it needs to go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With