Most instructions in RISC-V assembler order the destination operand before the source one, e.g.:
li t0, 22 # destination, source
li t1, 1 # destination, source
add t2, t0, t1 # destination, source
But the store instructions have that order reversed:
sb t0, (sp) # source, destination
lw t1, (a0) # destination, source
vlb.v v4, (a1) # destination, source
vsb.v v5, (a2) # source, destination
How come?
What is the motivation for this (arguably) asymmetric assembler syntax design?
RISC-V Assembly Code Remember that addi sign-extends the 12-bit immediate, so a negative immediate will have all 1's in its upper 20 bits. Because all 1's is −1 in two's complement, adding all 1's to the upper immediate results in subtracting 1 from the upper immediate.
The lw will reload the value that was written by the sw . (Note that the lw in this case may also expand to multiple instructions depending on VAL , whereas the sw is a single instruction regardless of VAL .)
In the following specification the term XLEN refers to the width of an x register in bits, as described in the RISC-V ISA specification.
lb = load byte, lh = load halfword, lw = load word, ld = load doubleword. Stores (dereferences) from register t0 into memory address (sp + 8). sb = store byte, sh = store halfword, sw = store word, sd = store doubleword. Adds value of t0 to the value of t1 and stores the sum into a0.
I don't see a real inconsistency in RISC-V assembly when it comes to destination and source operands: The destination operand – when it's part of the instruction encoding – always corresponds to the first operand in the assembly language.
If we look at the following instruction examples from four of the six different instruction formats:
add t0, t1, t2
addi t0, t1, 1
1
jal ra, off
lui t0, 0x12345
In the assembly instructions above, the destination operand is the first operand. Clearly, this destination operand correspond to the destination register in the instruction encoding.
Now, let's focus on the store instructions (S-type format). As an example, consider the following store instruction:
sw t0, 8(sp)
I think it is crystal clear that t0
above is a source operand since the store instruction stores its contents in memory.
We can be tempted to think that 8(sp)
is a destination operand. However, by closely looking at the S-type instruction format:
We can tell that the 8(sp)
part in the assembly instruction above isn't really a single operand but actually two, i.e., the immediate 8
(i.e., imm) and the source register sp
(i.e., rs1). If the instruction could be expressed instead like (similar to addi
2):
sw t0, sp, 8
It would become evident that this instruction takes three operands, not just two.
The register sp
is not modified, only read; it can't be, therefore, considered a destination register. It is also a source register, just as t0
is – the register whose contents the store instruction stores in memory. Memory is the destination operand since it is what receives the content of t0
.
The S-type instruction format doesn't encode a destination operand. What the instruction does encode is addressing information on the destination operand. For sw t0, 8(sp)
, the destination operand is the word in memory at the location specified by the effective address that the store instruction calculates from sp
and 8
. The register sp
contains part of that addressing information about that word in memory (i.e., the destination operand).
Assembly instructions in RISC-V that encode a destination operand have this operand as the first one. A store instruction, however, doesn't encode a destination operand. Its destination operand is a location in memory, and the address of this location in memory is computed from the contents of the instruction source operands.
1We could possibly argue that the jal ra, off
instruction above has an additional destination operand, namely pc
, because pc
is updated in the following way: pc
← pc
+ SignExtension(off
).
However, executing any other instruction also results in modifying pc
, e.g., incrementing pc
by four (may be different for branches and jalr
). Anyway, pc
is not encoded in any instruction, and it is not directly accessible to the programmer as a register. Therefore, it is not of interest to the discussion. For the same reason, I've also omitted the B-type format from this discussion.
2Or the just other way around: think as if you could express addi t0, t0, -1
as addi t0, -1(t0)
. Would you then say that addi
takes two operands (e.g., t0
and -1(t0)
)?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With