Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert a signed 8-bit byte to a signed 16-bit integer in assembly?

Using an Arduino, I have to write a function in Atmel AVR Assembly for my computer science class that converts a signed 8-bit byte to a signed 16-bit integer. I am not allowed to use any branching instructions either (but skips are fine).

I know it's wrong, but this is all I have gotten so far:

.global byteToInt
byteToInt:
  sbrc r24, 7
  ldi r25, 1
  asr r25
  ret

Does anyone know how I would make this function work? Any help would be much appreciated!

like image 953
stealthbomber10 Avatar asked Nov 29 '16 06:11

stealthbomber10


1 Answers

Well obviously you need to copy the sign bit of the char to every bit in the upper half. On most architectures, the straightforward would be to copy a register and arithmetic-right-shift it by 7. But AVR only has a shift-by-1 instruction, so we can't efficiently do that.

Another trick for conditionally getting 0 or -1 into a register is subtract-with-borrow a register from itself to get 0 - C. e.g. sbc r25, r25.

Now we just need a way to set the Carry flag if the 8-bit number is negative, i.e. if it's > 127 when treated as an unsigned integer, because C is always set based on the unsigned interpretation of things. AVR has a compare-immediate instruction, CPI, but it only works for r16-r31, not for low registers. Also, it sets the C flag opposite of what we really want, so we'd have to use another instruction to invert the result. So I think we're better off doing the compare the other way against a value in a register:

; Most efficient way, I think:
sign_extend:
    ldi   r25, 127       ; can be hoisted out of loops, and any reg is fine.

    cp    r25, r24        ; C = (r24 < 0)
    sbc   r25, r25        ; r25 = (r24 < 0) ? -1 : 0
    ; result in r25:r24

Even better, if you need to do this in a loop, you can keep 127 in a different register.

With CPI, you'd do this:

; slightly worse: only works with r16-r31, and worse in loops
sign_extend:
    cpi   r24, 127        ; C = (r24 < 128U) = ((signed)r24 >= 0)
    sbc   r25, r25        ; r25 = (r24>=0) ? -1 : 0
    com   r25             ; ones-complement negation: 0 : -1

Or, to avoid a restriction on which register is used, do the compare the other way:

I've never worked with AVR, so I'm just basing this off the instruction set reference manual that google found (and my knowledge of asm for other ISAs, like x86 and ARM). According to those docs, all of those instructions are 1 word (2 bytes), with 1 cycle latency. This is better than what gcc4.5 does:


The usual way to find good instruction sequences is to ask a compiler AVR gcc4.5 -O3 on godbolt does this:

short sign_extend(signed char a) { return a; }

sign_extend:
    mov r18,r24     ;; IDK why gcc uses r18 and r19.

    clr r19
    sbrc r18,7
    com r19

    mov r25,r19
    ret

So it zeros R19, then uses SBRC to conditionally execute a logical-not (COM), depending on the sign bit (bit 7) of R18.

I'm not sure what the extra MOVs are for. I'm also not sure why it inverts the zero instead of setting all bits with no input-dependency. (e.g. ldi r19, $FF, or the SBR alias for it. If an out-of-order-execution AVR ever existed, then that would be more efficient. :P

I'm not sure what the MOV instructions are for. SBRC is non-destructive. So AFAICT, a valid implementation would be

sign_extend:
    clr   r25
    sbrc  r24,7
    ldi   r25, $FF
    ret

This is still worse than CP / SBC, because SBRC takes 2 cycles if the skip is taken.


I assume that SBC's "false dependency" on the old value of R25 isn't a thing on AVR. On out-of-order x86 CPUs, only AMD recognizes sbb eax, eax as being independent of the old value of eax, and depending only on flags. Intel CPUs just run it normally. (They do recognize instructions like xor eax,eax as independent, and it's the standard zeroing-idiom for x86.)

So on non-AMD CPUs, if the last code that wrote EAX did so with a load that missed in cache, or something else high-latency, sbb eax, eax couldn't execute even if the flags were ready (i.e. from an independent dependency chain). But on AMD CPUs, it would start a new dependency chain for EAX.

Anyway, I assume AVR is a fairly simple in-order pipelined design, so there's no way for an old register to be a performance land-mine unless the code that did (for example) a cache-miss load into it never used the result. (Even in-order pipelines don't need to wait for high-latency operations until something uses the result.)

like image 193
Peter Cordes Avatar answered Dec 24 '22 16:12

Peter Cordes