I was trying to read RISC-V
assembly generated by gcc
and I found that gcc
creates sequence of auipc
+jalr
for some function calls and I don't understand how it works. Here's a simple example. Consider the following C
source file:
unsigned long id(unsigned long x) {
return x;
}
unsigned long add_one(unsigned long x) {
return id(x)+1;
}
I compile it with gcc -O2 -fno-inline -c test.c
and I get the following assembly code:
$ objdump -d test.o
test.o: file format elf64-littleriscv
Disassembly of section .text:
0000000000000000 <id>:
0: 00008067 ret
0000000000000004 <add_one>:
4: ff010113 addi sp,sp,-16
8: 00113423 sd ra,8(sp)
c: 00000317 auipc t1,0x0
10: 000300e7 jalr t1
14: 00813083 ld ra,8(sp)
18: 00150513 addi a0,a0,1
1c: 01010113 addi sp,sp,16
20: 00008067 ret
What confuses me are the two lines at the offsets 0x0c
and 0x10
, which is where the function id
is supposed to be called. According to the spec, auipc t1,0x0
should write PC + 0x0<<12
(which is equal to PC
) to t1
and then jalr t1
(which gets expanded to jalr ra,t1,0
) jumps to the address stored in t1
and stores the return address to ra
. So we end up jumping to the auipc
line (offset 0x0c
), not the entry point of id
. What's going on here?
When disassembling an object file, the displayed address information in auipc
/jalr
is kind of arbitrary because it's get relocated by the linker, anyways.
You can see that when also dumping the relocation information (add -r
to your objdump call):
0000000000000000 <id>:
0: 8082 ret
0000000000000002 <add_one>:
2: 1141 addi sp,sp,-16
4: e406 sd ra,8(sp)
6: 00000097 auipc ra,0x0
6: R_RISCV_CALL id
6: R_RISCV_RELAX *ABS*
a: 000080e7 jalr ra # 6 <add_one+0x4>
e: 60a2 ld ra,8(sp)
10: 0505 addi a0,a0,1
12: 0141 addi sp,sp,16
14: 8082 ret
Those relocation entries tell the linker to relocate the jump instructions in a relaxed fashion (the default for the RISC-V toolchain). That means it's allowed to replace auipc
+jalr
pairs with just one jal
instruction iff the distance to the target address is short enough. Such replacements are advantageous because it saves instructions, i.e. the resulting program is shorter. Obviously, it complicates the relocation procedure a bit, because the offsets of following jump instructions need to be adjusted accordingly.
(This can be disabled with the -mno-relax
GCC flag.)
Why can't the assembler directly emit final auipc
/jalr
/jal
instructions for symbols local to the translation unit that don't need to be relocated? After all, those jumps are pc-relative.
In general it can't because with just the local view of one translation unit 1) a relaxed relocation to an external symbol may change all following offsets to internal symbols and 2) the linker might even apply some advanced rule, e.g. where an internal symbol is overlayed by an external one, such that it really has to be relocated in the linker. Or, another example, where the linker deletes a symbol.
If you want to look at relocated addresses/offsets you have to disassemble the linked binary, e.g.:
000000000001015c <id>:
1015c: 8082 ret
000000000001015e <add_one>:
1015e: 1141 addi sp,sp,-16
10160: e406 sd ra,8(sp)
10162: ffbff0ef jal ra,1015c <id>
10166: 60a2 ld ra,8(sp)
10168: 0505 addi a0,a0,1
1016a: 0141 addi sp,sp,16
1016c: 8082 ret
As expected, the linker relaxes auipc
+jalr
to just jal
. Unfortunately, objdump doesn't display the raw jal
offset - 1015c
is the absolute address after adding the offset to 10162
.1
You can verify it by decoding the binary instruction in the second column by yourself:
0xffbff0ef
= 0b11111111101111111111000011101111 | split into the offset parts
=> 1 1111111101 1 11111111 | i.e. off[20], off[10:1], off[11], off[19:12]
| merge them into off[20:1]
=> 0b11111111111111111101 | left-shift by 1
=> 0b111111111111111111010 | sign-extend
=> 0b11111111111111111111111111111010
= -6
=> 0x10162 - 6
= 0x1015c
Which matches the objdump output.
1 That means GNU binutils objdump doesn't display the raw jal
offset. In contrast, llvm-objdump
(LLVM 9 introduces official RISC-V support) does display the raw offset:
000000000001015e add_one:
1015e: 41 11 addi sp, sp, -16
10160: 06 e4 sd ra, 8(sp)
10162: ef f0 bf ff jal -6
10166: a2 60 ld ra, 8(sp)
10168: 05 05 addi a0, a0, 1
1016a: 41 01 addi sp, sp, 16
1016c: 82 80 ret
However, in contrast to GNU binutils objdump, llvm-objdump
doesn't include the resulting absolute address as an annotation. Neither does it annotate the corresponding symbol. Thus, the GNU binutils objdump output arguably is more useful, in general.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With