I am trying to check how AVR-GCC compiler compiles for multiplication?
Input c code:
unsigned char square(unsigned char num) {
return num * num;
}
Output assembly code:
square(unsigned char):
mul r24,r24
mov r24,r0
clr r1
ret
My question is why it is adding the statement clr r1
? Seemingly, one could have removed this statement and still got as desired, assuming the parameter is stored in r24 and the return value is available at r24.
Direct Godbolt link: https://godbolt.org/z/PsPS_N
I also see related more general discussion here.
That would be a matter of the AVR ABI used by GCC. In particular:
R1
always contains zero. During an insn the content might be destroyed, e.g. by a MUL instruction that uses R0/R1 as implicit output register. If an insn destroys R1, the insn must restore R1 to zero afterwards. [...]
And that's exactly what you see in the assembly. R1 is clobbered by the MUL
, so it must afterward be cleared to zero.
When GCC's AVR backend was implemented and the avr-gcc ABI was devised, it turned out that code generation can be improved in some situations when there is a register that is known to contain 0
. The author chose R1
back then, i.e. when avr-gcc is printing assembly instructions, one may assume that R1=0
like in this example:
unsigned add (unsigned x, unsigned char y)
{
if (x != 64)
return x + y;
else
return x;
}
This compiles with -c -Os -save-temps
to the code below. It uses R1
aka. __zero_reg__
so it can print a shorter instruction sequence:
__zero_reg__ = 1
add:
cpi r24,64
cpc r25,__zero_reg__
breq .L2
add r24,r22
adc r25,__zero_reg__
.L2:
ret
R1
was chosen because in an AVR, the higher registers are more powerful and therefore register allocation starts – with a grain of salt – at the higher registers, hence the low registers would be used last. Thus a register with a small register number was used.
This special register is not managed by the register allocator, it is "fixed" and managed by hand. This was all simple with the early AVRs which didn't support MUL
instructions. With the introduction of MUL
and cousins however, things got more complicated because MUL
is using register pair R1:R0
as implicit output register and hence overrides the 0
held in __zero_reg__
.
Thus you can implement two approaches:
CLR __zero_reg__
prior to each use so R1
contains 0
.The avr backend implements approach 2.
Because in the current avr backend (at least up to v10) this register is managed by hand, there is no information whether clearing that register is actually needed or might be omitted:
unsigned char mul (unsigned char x)
{
return x * x * x;
}
produces with -c -Os -mmcu=atmega8 -save-temps
:
mul:
mul r24,r24
mov r25,r0
clr r1
mul r25,r24
mov r24,r0
clr r1
ret
i.e. R1
is cleared twice even though right after the 1st 'CLR' the 'MUL' instruction is overriding it again. In principle, the avr backend could track which instructions clobber R1
and which instruction (sequence)s require R1=0
, however this is currently (v10) not implemented.
The introduction of MUL
lead to yet another complication: R1
is no more always zero, i.e. when an interrupt triggers right after a MUL
then the register is in general not zero. Thus an interrupt service routine (ISR) must save+restore it when it might use R1
:
#include <avr/interrupt.h>
char volatile v;
ISR (__vector_1)
{
v = 0;
}
Compiling, assembling and then avr-objdump -d
on the object file reads:
00000000 <__vector_1>:
0: 1f 92 push r1
2: 1f b6 in r1, 0x3f
4: 1f 92 push r1
6: 11 24 eor r1, r1
8: 10 92 00 00 sts 0x0000, r1
c: 1f 90 pop r1
e: 1f be out 0x3f, r1
10: 1f 90 pop r1
12: 18 95 reti
The payload of the ISR is just sts ..., r1
which stores 0
to v
. This requires R1=0
, hence the need for clr r1
, hence save-restore R1
by means of push+pop. The clr
clobbers the program status (SREG at I/O address 0x3f), thus SREG must also be saved-restored around that sequence, and in order to accomplish that the compiler is using r1
as a scratch register as special function registers cannot be used with push
/pop
.
Apart from that, there are situations where there is no reset of zero-reg after a MUL
:
int square (int a)
{
return a * a;
}
compiles to:
mul r24,r24
movw r18,r0
mul r24,r25
add r19,r0
add r19,r0
clr r1
movw r24,r18
ret
The reason there is no CLR
after the 1st MUL
is because the multiplication sequence is internally represented and then emit as one chunk (insn), hence there is knowledge that there is no need for an intermediate CLR
. In the example from above with x * x * x
however, the internal representation is two insns, one for either multiplication.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With