Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RISC-V: PC Absolute vs PC Relative

Tags:

assembly

riscv

I am new to RISC-V.

I am having trouble comprehending when to write PC (Program Counter) relative instructions and when to write PC absolute instructions.

For example, an instruction with lui followed by jalr instruction is considered PC-absolute, and an instruction with auipc followed by jalr instruction is considered PC-relative.

To my understanding, all instructions will be executed by the PC so doing such PC absolute instructions seem to be hidden (i.e. without the knowledge of PC).

To me, those PC-absolute instructions will not be executed.

Can someone provide some basic examples to help me understand this?

like image 403
c3r0 Avatar asked Dec 06 '22 11:12

c3r0


2 Answers

I think the issue you're having is this concept of "PC-absolute", which isn't actually a thing. Your options are "PC relative" and "absolute". RISC-V defines two addressing instructions that allow these modes to be implemented efficiently:

  • lui (Load Upper Immediate): this sets rd to a 32-bit value with the low 12 bits being 0 and the high 20 bits coming from the U-type immediate.
  • auipc (Add Upper Immediate to Program Counter): this sets rd to the sum of the current PC and a 32-bit value with the low 12 bits as 0 and the high 20 bits coming from the U-type immediate.

These instructions are essentially the same: they both take a U-type immediate (ie, the high 20 bits of a 32-bit quantity), add it to something, and produce the result in rd. The difference is that lui adds that immediate to 0, while auipc adds that immediate to the PC. Sometimes it's easier to think of the two addressing modes as "PC-relative" and "0-relative", as that makes the distinction a bit more explicit.

While both auipc and lui and designed to work as the first instruction in a two-instruction pair, the second instruction isn't particularly relevant. Both auipc and lui fill out the high 20 bits of a 32-bit address, leaving the instruction they're paired with to fill out the low 12 bits. The I and S formatted instructions are designed to pair well here, and there's an I or S variant of every instruction in the base ISA for which such a format would make sense.

As a concrete example, the following C code performs a very simple

int global;
int func(void) { return global; }

As an example, let's assume that global is at 0x20000004 and the PC of the first instruction in func is 0x10000008.

When compiled with -mcmodel=medlow (a 0-relative addressing mode), you'll get

func:
    lui a0, 0x20000
    lw  a0, 0x004(a0)

As you can see, the full absolute address of global (0x2000004) is filled into the instruction pair. On the other hand, when compiled with -mcmodel=medany (a PC-relative addressing mode) you'll get

func:
    auipc a0, 0x10000
    lw    a0, 0x004(a0)

This time, only the offset between the PC of the auipc and the target symbol appears in the instruction pair. This happens because the PC is explicitly (by the use of the auipc instruction) included in the addressing calculation. In this case, that auipc is setting a0 to 0x2000004: the calculation performed is a0 = PC + (imm20 << 12), and here we have 0x10000004 for the PC and 0x10000 for imm20.

These PC-relative addressing sequences also allows a modicum of position independence: if you're very careful to limit what you're doing, linked binaries can be produced that will still work when loaded at a different offset from where they're linked. In practice this isn't sufficient for full position-independent addressing in POSIX-style systems (which is why we also have a -fPIC argument, like everyone else does), but if you're in a tightly constrained embedded system you might be able to get away with it.

For the final wrinkle, like pretty much everything else in the RISC-V ISA the immediates used by auipc and lui are sign extended to XLEN. On 32-bit systems these addressing modes can generate any address in the system, but that's not the case for 64-bit systems. That's why we call these addressing modes "medany" and "medlow": the "med" stands for "medium", which means a 4GiB window into which all global symbols must fit. The "low" part means this window is centered around the absolute address 0, while in the "any" part means this window is centered around whatever PC this is linked at.

like image 89
Palmer Dabbelt Avatar answered Jan 07 '23 17:01

Palmer Dabbelt


PC-relative
absolute

You call some instruction (or code) "PC relative" if the addresses are calculate relative to the address of the code itself.

You call an instruction "absolute" when an address is not calculated relative to the address of the instruction itself.

Unfortunately I don't know about the RISC V CPU but the following example for the (old) 68000 CPU shows you what is meant:

x:
    lea.l (PC+y-x-2), a0
    lea.l (y).l, a0
  ...
y:

Both instructions will load the address y into the register a0.

However there is a difference:

Suppose the code is located at address 0x1000 and the address y is located at address 0x2000.

Now we move the code to address 0x1200 and execute the code there. What will happen?

The first instruction will load the address 0x2200 to the register:

The address is calculated relative to the address of the instruction: It is calculated as (address of the instruction)+0x1000. And because the instruction is now located at address 0x1000 instead of 0x1200 the value to be written to the register will be 0x2200, not 0x2000.

This is called (PC) relative addressing.

The second instruction will load the address 0x2000 into the register. It always loads the value 0x2000 into the register - the address of the instruction itself does not matter.

This is called absolute addressing.

like image 38
Martin Rosenau Avatar answered Jan 07 '23 16:01

Martin Rosenau