A group of us (electronic engineering students - UK) have recently been getting to grips, in our own time, with programming the PIC16F84A microcontroller. The need has arisen to multiply two 8-bit numbers together, with no known min/max for each. A fellow student presented the following idea.
multiply_numbers:
; Takes numbers in Num1 and Num2, and returns product in OutH:OutL
clrf OutH ; clear all non-input variables
clrf OutL
mult_loop
bcf STATUS,c ; clear carry bit
movfw Num2
addwf OutL ; add Num2 to OutL
btfsc STATUS,c ; check carry bit
incf OutH ; if set, increment OutH
decfsz Num1 ; decrement Num1
goto mult_loop ; if Num1 is not zero, repeat loop
return ; else return
I felt that this, although quite short in terms of lines of code, could take a relatively long time to execute for larger numbers. I did a bit of thinking myself and started along a route of shifting one number to the right, the other to the left, and adding the left-shifted number a certain amount of times, along the way, to the output to arrive at the final answer. I wasn't quite doing it right, but then stumbled across this question on SO which gave me the idea of expressing one of the input numbers as:
N = a_0 + a_1*2 + a_2*2^2 + a_3*2^3 + ... + a_7*2^7
From that starting point, I came up with this method for multiplying two 8-bit numbers to get a 16-bit output (stored in two 8-bit registers).
multiply_numbers:
; Takes numbers in Num1 and Num2L, and returns product in OutH:OutL
clrf Num2H ; clear all non-input variables
clrf OutL
clrf OutH
mult_loop
btfsc Num1,0 ; test LSB of Num1
call add_num16 ; if set, add Num2H:Num2L to OutH:OutL
call shift_left ; shift Num2H:Num2L left (multiply by 2)
rrf Num1,f ; shift Num1 right
clrw ; clear working register (0x00)
bcf STATUS,z ; clear zero bit (3) of the STATUS register
addwf Num1,w ; add 0x00 to Num1
btfss STATUS,z ; if Num1 is zero, then exit loop
goto mult_loop ; else, continue with another iteration
return
add_num16
movfw Num2H
addwf OutH,f ; add Num2H to OutH
bcf STATUS,c ; clear carry bit (0) of the STATUS register
movfw Num2L
addwf OutL,f ; add Num2L to OutL
btfsc STATUS,c ; check carry bit
incf OutH,f ; increment OutH if set (OutL overflowed)
return
shift_left
bcf STATUS,c ; clear carry bit
rlf Num2L,f ; rotate Num2L left (carry -> LSB, MSB -> carry)
rlf Num2H,f ; rotate Num2H left, using carry bit from Num2L
return
I think this second example is quicker in most cases, simply because the loop will only repeat up to 8 times instead of up to 256 times.
Am I correct in my assumption of their relative speed/efficiency? And does the second block of code actually function as I intend it to (are there any potential problems with it that I've missed)? Lastly, can this multiplication be further optimized using techniques not already employed?
Thank you in advance.
P.S. All variables/registers have been properly defined with their own address. The extensive code commenting is because we are trying to compile a set of routines that we can refer back to in the future and still know at a glance what is going on and why.
P.P.S. This question is related to personal/hobby interest in programming this pic and has nothing to do with any current coursework, etc. Just to allay any suspicions you might have had!
Microcontroller: PIC16F84A
Development environment: MPLABX IDE v1.10
Compiler: mpasm (v5.43)
Method 2 improved:
multiply_numbers:
; Takes numbers in Num1 and Num2, and returns product in OutH:OutL
clrf OutL ; clear all non-input variables
clrf OutH
; 1st iteration
btfsc Num1,7 ; test MSB of Num1
call add_num8 ; if set, add Num2 to OutH:OutL
bcf STATUS,c ; clear carry bit
rlf OutL,f ; rotate OutL left (carry -> LSB, MSB -> carry)
rlf OutH,f ; rotate OutH left, using carry bit from OutL
rlf Num1,f ; shift Num1 left
; 2nd iteration
btfsc Num1,7
call add_num8
bcf STATUS,c
rlf OutL,f
rlf OutH,f
rlf Num1,f
; 3rd iteration
btfsc Num1,7
call add_num8
bcf STATUS,c
rlf OutL,f
rlf OutH,f
rlf Num1,f
; 4th iteration
btfsc Num1,7
call add_num8
bcf STATUS,c
rlf OutL,f
rlf OutH,f
rlf Num1,f
; 5th iteration
btfsc Num1,7
call add_num8
bcf STATUS,c
rlf OutL,f
rlf OutH,f
rlf Num1,f
; 6th iteration
btfsc Num1,7
call add_num8
bcf STATUS,c
rlf OutL,f
rlf OutH,f
rlf Num1,f
; 7th iteration
btfsc Num1,7
call add_num8
bcf STATUS,c
rlf OutL,f
rlf OutH,f
rlf Num1,f
; 8th iteration
btfss Num1,7 ; test MSB of Num1
return ; if not set, then return. else...
add_num8
bcf STATUS,c ; clear carry bit (0) of the STATUS register
movfw Num2
addwf OutL,f ; add Num2L to OutL
btfsc STATUS,c ; check carry bit
incf OutH,f ; increment OutH if set (OutL overflowed)
return
Bit-shifting is still faster, but for non-power-of-two mul/div by the time you do all your shifts and add the results it's slower again.
Shifting bits left and right is apparently faster than multiplication and division operations on most, maybe even all, CPUs if you happen to be using a power of 2. However, it can reduce the clarity of code for some readers and some algorithms.
Most bitwise operations are presented as two-operand instructions where the result replaces one of the input operands. On simple low-cost processors, typically, bitwise operations are substantially faster than division, several times faster than multiplication, and sometimes significantly faster than addition.
Multiplication. To multiply a number, a binary shift moves all the digits in the binary number along to the left and fills the gaps after the shift with 0: to multiply by two, all digits shift one place to the left. to multiply by four, all digits shift two places to the left.
Yes, but you can likely do better. There are a bunch of classic "tricks" to do this.
First, knowing that the multiplier can be interpreted as a sum of powers of two, you cleverly just add the multiplicand to the multiplier when a multiplicand bit is nonzero.
Second, the value added is only the size of the multiplicand. While you need a 16 (partial and) final product, you don't have to do 16 bit adds; you can do 8 bit adds and propagate any carry. This is usually easy to do in assembler.
To keep the time down, you don't want to call and ADD routine in the middle of you loop. Inline the code to save the time taken by the call, the return, and optimize any register shuffling. Finally, you loop only and exactly 8 times; it it worthwhile to unroll such a loop 8 times to avoid the overhead of the counter, and lower the "register pressure" caused by having to fiddle with counter, giving you more freedom to optimize.
Notice I've said nothing about the PIC controller, and in fact I don't know its instruction set. But what I have said is relevant to anybody implementing an 8 bit multiply. (There are equivlent tricks for 16, 32, and 64 bit multiplies). So on can abstractly write the following code:
mul16: // computes M1 * M2 --> P where M1 and M2 are 8 bit values, P is 16 bits
// P is represent by Plow and Phigh 8 bit values.
// reset the (partial) product
Plow=0; Phigh=0; // all 16 bits
// First iteration:
if msb(M1)==1 then { Plow+=M2; if carry then Phigh++; /* propagate carry */ }
shift M1 left bit;
shift (Phigh,Plow) left one bit
// Second iteration
<same as first>
<3rd ..7th iteration, same as first>
// 8th iteration
if msb(M1)==1 then { Plow+=M2; if carry then Phigh++ }
// dont bother: shift M1 left bit;
// dont bother: shift (Phigh,Plow) left one bit
<done>
You may cleverly notice that what is written as "if msb(M1)..." and "shift M1 left one bit" are often easily implemented with assembly language "shift left" instructions, or in desperation, adding a value to itself :-} Similarly, the "if carry ... add one" is often implemented with an "add carry" instruction.
I leave it to you to recode this for the PIC.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With