Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mixed destination/source operand order in RISC-V assembly syntax

Tags:

assembly

riscv

Most instructions in RISC-V assembler order the destination operand before the source one, e.g.:

li  t0, 22        # destination, source
li  t1, 1         # destination, source
add t2, t0, t1    # destination, source

But the store instructions have that order reversed:

sb    t0, (sp)    # source, destination
lw    t1, (a0)    # destination, source
vlb.v v4, (a1)    # destination, source
vsb.v v5, (a2)    # source, destination

How come?

What is the motivation for this (arguably) asymmetric assembler syntax design?

like image 574
maxschlepzig Avatar asked Jan 18 '20 16:01

maxschlepzig


People also ask

What does Addi do in RISC-V?

RISC-V Assembly Code Remember that addi sign-extends the 12-bit immediate, so a negative immediate will have all 1's in its upper 20 bits. Because all 1's is −1 in two's complement, adding all 1's to the upper immediate results in subtracting 1 from the upper immediate.

What does LW do in RISC-V?

The lw will reload the value that was written by the sw . (Note that the lw in this case may also expand to multiple instructions depending on VAL , whereas the sw is a single instruction regardless of VAL .)

What is XLEN in RISC-V?

In the following specification the term XLEN refers to the width of an x register in bits, as described in the RISC-V ISA specification.

What does lb do in RISC-V?

lb = load byte, lh = load halfword, lw = load word, ld = load doubleword. Stores (dereferences) from register t0 into memory address (sp + 8). sb = store byte, sh = store halfword, sw = store word, sd = store doubleword. Adds value of t0 to the value of t1 and stores the sum into a0.


1 Answers

I don't see a real inconsistency in RISC-V assembly when it comes to destination and source operands: The destination operand – when it's part of the instruction encoding – always corresponds to the first operand in the assembly language.

If we look at the following instruction examples from four of the six different instruction formats:

  • R-type: add t0, t1, t2
  • I-type: addi t0, t1, 11
  • J-type: jal ra, off
  • U-type: lui t0, 0x12345

In the assembly instructions above, the destination operand is the first operand. Clearly, this destination operand correspond to the destination register in the instruction encoding.

Now, let's focus on the store instructions (S-type format). As an example, consider the following store instruction:

sw t0, 8(sp)

I think it is crystal clear that t0 above is a source operand since the store instruction stores its contents in memory.

We can be tempted to think that 8(sp) is a destination operand. However, by closely looking at the S-type instruction format:

S-type format

We can tell that the 8(sp) part in the assembly instruction above isn't really a single operand but actually two, i.e., the immediate 8 (i.e., imm) and the source register sp (i.e., rs1). If the instruction could be expressed instead like (similar to addi2):

sw t0, sp, 8

It would become evident that this instruction takes three operands, not just two.

The register sp is not modified, only read; it can't be, therefore, considered a destination register. It is also a source register, just as t0 is – the register whose contents the store instruction stores in memory. Memory is the destination operand since it is what receives the content of t0.

The S-type instruction format doesn't encode a destination operand. What the instruction does encode is addressing information on the destination operand. For sw t0, 8(sp), the destination operand is the word in memory at the location specified by the effective address that the store instruction calculates from sp and 8. The register sp contains part of that addressing information about that word in memory (i.e., the destination operand).

Summary

Assembly instructions in RISC-V that encode a destination operand have this operand as the first one. A store instruction, however, doesn't encode a destination operand. Its destination operand is a location in memory, and the address of this location in memory is computed from the contents of the instruction source operands.


1We could possibly argue that the jal ra, off instruction above has an additional destination operand, namely pc, because pc is updated in the following way: pcpc + SignExtension(off). However, executing any other instruction also results in modifying pc, e.g., incrementing pc by four (may be different for branches and jalr). Anyway, pc is not encoded in any instruction, and it is not directly accessible to the programmer as a register. Therefore, it is not of interest to the discussion. For the same reason, I've also omitted the B-type format from this discussion.

2Or the just other way around: think as if you could express addi t0, t0, -1 as addi t0, -1(t0). Would you then say that addi takes two operands (e.g., t0 and -1(t0))?

like image 64
ネロク・ゴ Avatar answered Sep 28 '22 00:09

ネロク・ゴ