Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

implement 64-bit arithmetic on a 32-bit machine

The following code computes the product of x and y and stores the result in memory. Data type ll_t is defined to be equivalent to long long.

typedef long long ll_t;
void store_prod(ll_t *dest, int x, ll_t y) {
    *dest = x*y;
}

gcc generates the following assembly code implementing the computation: dest at %ebp+8, x at %ebp+12, y at %ebp+16

1 movl 16(%ebp), %esi
2 movl 12(%ebp), %eax
3 movl %eax, %edx
4 sarl $31, %edx
5 movl 20(%ebp), %ecx
6 imull %eax, %ecx
7 movl %edx, %ebx
8 imull %esi, %ebx
9 addl %ebx, %ecx
10 mull %esi
11 leal (%ecx,%edx), %edx
12 movl 8(%ebp), %ecx
13 movl %eax, (%ecx)
14 movl %edx, 4(%ecx)

This code uses three multiplications to implement the multiprecision arithmetic required to implement 64-bit arithmetic on a 32-bit machine. Describe the algorithm used to compute the product, and annotate the assembly code to show how it realizes your algorithm.

I don't understand line 8 and line 9 in assembly code above. Can anyone help?

like image 963
morsel.wang Avatar asked Jul 27 '12 02:07

morsel.wang


2 Answers

I've converted it to intel syntax.

mov esi, y_low
mov eax, x
mov edx, eax
sar edx, 31
mov ecx, y_high

imul ecx, eax ; ecx = y_high *{signed} x

mov ebx, edx

imul ebx, esi ; ebx = sign_extension(x) *{signed} y_low

add ecx, ebx ; ecx = y_high *{signed} x_low + x_high *{signed} y_low

mul esi ; edx:eax = x_low *{unsigned} y_low

lea edx, [ecx + edx] ; edx = high(x_low *{unsigned} y_low + y_high *{signed} x_low + x_high *{signed} y_low)

mov ecx, dest
mov [ecx], eax
mov [ecx + 4], edx

What the above code does is multiplication of 2 64-bit signed integers that keeps the least-significant 64 bits of the product.

Where does the other 64-bit multiplicand come from? It's x sign-extended from 32 bits to 64. The sar instruction is used to replicate x's sign bit into all bits of edx. I call this value consisting only of the x's sign x_high. x_low is the value of x actually passed into the routine.

y_low and y_high are the least and most significant parts of y, just like x's x_low and x_high are.

From here it's pretty easy:

product = y *{signed} x =
(y_high * 232 + y_low) *{signed} (x_high * 232 + x_low) =
y_high *{signed} x_high * 264 +
y_high *{signed} x_low * 232 +
y_low *{signed} x_high * 232 +
y_low *{signed} x_low

y_high *{signed} x_high * 264 isn't calculated because it doesn't contribute to the least significant 64 bits of the product. We'd calculate it if we were interested in the full 128-bit product (full 96-bit product for the picky).

y_low *{signed} x_low is calculated using unsigned multiplication. It's legal to do so because 2's complement signed multiplication gives the same least significant bits as unsigned multiplication. Example:
-1 *{signed} -1 = 1
0xFFFFFFFFFFFFFFFF *{unsigned} 0xFFFFFFFFFFFFFFFF = 0xFFFFFFFFFFFFFFFE0000000000000001 (64 least significant bits are equivalent to 1)

like image 120
Alexey Frunze Avatar answered Nov 11 '22 15:11

Alexey Frunze


Consider the context of line 8 and 9.

By this time, ESI contains the lower half of y and EBX contains sgn(x). So line 8 is just computing sgn(x) * (y % 2^32) and storing it in EBX.

Line 9 draws upon that result. By the time Line 9 happens, ECX contains a partial upper half of the multiplication, that is, x * (y >> 32) signed. So EBX+ECX ends up being what we computed in the last step plus the partial upper half we found on a previous line.

The full algorithm itself is pretty neat ;)

EDIT: In response to a comment below...

Line 4: Consider what SAR EDX, 31 (or if you like, sar $31, %edx) really means. Since EDX is a 32-bit register, you'll end up with one of two values. Which two? Consider what they mean in the context of signed arithmetic.

Line 7: EDX by this point contains something pretty useful for the following operations. I'm just moving it where it needs to go.

like image 28
atomicinf Avatar answered Nov 11 '22 15:11

atomicinf