Using an Arduino, I have to write a function in Atmel AVR Assembly for my computer science class that converts a signed 8-bit byte to a signed 16-bit integer. I am not allowed to use any branching instructions either (but skips are fine).
I know it's wrong, but this is all I have gotten so far:
.global byteToInt
byteToInt:
sbrc r24, 7
ldi r25, 1
asr r25
ret
Does anyone know how I would make this function work? Any help would be much appreciated!
Well obviously you need to copy the sign bit of the char
to every bit in the upper half. On most architectures, the straightforward would be to copy a register and arithmetic-right-shift it by 7. But AVR only has a shift-by-1 instruction, so we can't efficiently do that.
Another trick for conditionally getting 0 or -1 into a register is subtract-with-borrow a register from itself to get 0 - C
. e.g. sbc r25, r25
.
Now we just need a way to set the Carry flag if the 8-bit number is negative, i.e. if it's > 127 when treated as an unsigned integer, because C is always set based on the unsigned interpretation of things. AVR has a compare-immediate instruction, CPI, but it only works for r16-r31, not for low registers. Also, it sets the C flag opposite of what we really want, so we'd have to use another instruction to invert the result. So I think we're better off doing the compare the other way against a value in a register:
; Most efficient way, I think:
sign_extend:
ldi r25, 127 ; can be hoisted out of loops, and any reg is fine.
cp r25, r24 ; C = (r24 < 0)
sbc r25, r25 ; r25 = (r24 < 0) ? -1 : 0
; result in r25:r24
Even better, if you need to do this in a loop, you can keep 127 in a different register.
With CPI, you'd do this:
; slightly worse: only works with r16-r31, and worse in loops
sign_extend:
cpi r24, 127 ; C = (r24 < 128U) = ((signed)r24 >= 0)
sbc r25, r25 ; r25 = (r24>=0) ? -1 : 0
com r25 ; ones-complement negation: 0 : -1
Or, to avoid a restriction on which register is used, do the compare the other way:
I've never worked with AVR, so I'm just basing this off the instruction set reference manual that google found (and my knowledge of asm for other ISAs, like x86 and ARM). According to those docs, all of those instructions are 1 word (2 bytes), with 1 cycle latency. This is better than what gcc4.5 does:
The usual way to find good instruction sequences is to ask a compiler AVR gcc4.5 -O3
on godbolt does this:
short sign_extend(signed char a) { return a; }
sign_extend:
mov r18,r24 ;; IDK why gcc uses r18 and r19.
clr r19
sbrc r18,7
com r19
mov r25,r19
ret
So it zeros R19, then uses SBRC to conditionally execute a logical-not (COM), depending on the sign bit (bit 7) of R18.
I'm not sure what the extra MOVs are for. I'm also not sure why it inverts the zero instead of setting all bits with no input-dependency. (e.g. ldi r19, $FF
, or the SBR alias for it. If an out-of-order-execution AVR ever existed, then that would be more efficient. :P
I'm not sure what the MOV instructions are for. SBRC is non-destructive. So AFAICT, a valid implementation would be
sign_extend:
clr r25
sbrc r24,7
ldi r25, $FF
ret
This is still worse than CP / SBC, because SBRC takes 2 cycles if the skip is taken.
I assume that SBC's "false dependency" on the old value of R25 isn't a thing on AVR. On out-of-order x86 CPUs, only AMD recognizes sbb eax, eax
as being independent of the old value of eax, and depending only on flags. Intel CPUs just run it normally. (They do recognize instructions like xor eax,eax
as independent, and it's the standard zeroing-idiom for x86.)
So on non-AMD CPUs, if the last code that wrote EAX did so with a load that missed in cache, or something else high-latency, sbb eax, eax
couldn't execute even if the flags were ready (i.e. from an independent dependency chain). But on AMD CPUs, it would start a new dependency chain for EAX.
Anyway, I assume AVR is a fairly simple in-order pipelined design, so there's no way for an old register to be a performance land-mine unless the code that did (for example) a cache-miss load into it never used the result. (Even in-order pipelines don't need to wait for high-latency operations until something uses the result.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With