I'm trying to replace certain methods with asm-implementations. Target is arm64 on iOS (iPhone 5S or newer). I want to use a dedicated assembler-file, as the inline assembler comes with additional overhead, and is quite cumbersome to use with A64 memory offsets.
There is not too much documentation on this on the Internet, so I'm kind of unsure if how I do it is the way to go. Therefore, I'll describe the process I followed to move a function to ASM.
The candidate function for this question is a 256-bit integer comparison function.
UInt256.h
@import Foundation;
typedef struct {
uint64_t value[4];
} UInt256;
bool eq256(const UInt256 *lhs, const UInt256 *rhs);
Bridging-Header.h
#import "UInt256.h"
Reference implementation (Swift)
let result = x.value.0 == y.value.0
&& x.value.1 == y.value.1
&& x.value.2 == y.value.2
&& x.value.3 == y.value.3
UInt256.s
.globl _eq256
.align 2
_eq256:
ldp x9, x10, [x0]
ldp x11, x12, [x1]
cmp x9, x11
ccmp x10, x12, 0, eq
ldp x9, x10, [x0, 16]
ldp x11, x12, [x1, 16]
ccmp x9, x11, 0, eq
ccmp x10, x12, 0, eq
cset x0, eq
ret
Resources I found
Section 5.1.1 of the Procedure Call Standard for the ARM 64-bit Architecture (AArch64) document explains the purpose of each register during procedure calls.
iOS specific deviations.
iOS Assembler Directives.
Questions
I've tested the code using XCTest, creating two random numbers, running both the Swift and the Asm implementations on them and verifying that both report the same result. The code seems to be correct.
In the asm file: The .align
seems to be for optimization - is this really necessary, and if yes, what is the correct value to align to?
Is there any source that clearly explains how the calling convention for my specific function signature is?
a. How can I know that the inputs are actually passed via x0
and x1
?
b. How can I know that it is correct to pass the output in x0
?
c. How can I know that it is safe to clobber x9
-x12
and the status registers?
d. Is the function called the same way when I call it from C instead of Swift?
What does "Indirect result location register" mean for the r8
register description in the ARM document?
Do I need any other assembler directives besides .globl
?
When I set breakpoints, the debugger seems to get confused where it actually is, showing incorrect lines etc. Am I doing something wrong?
.align 2
directive is required for program correctness. A64 instructions need to be aligned on 32-bit boundaries.lhs
and rhs
get stored in X0
and X1
by by following the instructions given in section 5.4.2 (Parameter Passing Rules) of the Procedure Call Standard for the ARM 64-bit Architecture (AArch64) document you linked. Since the parameters are both pointers the only specific rule that applies is C.7.I should say one advantage of writing your code using inline assembly is that you wouldn't have to worry about any of this. Something like the following untested C code shouldn't be too unwieldy:
bool eq256(const UInt256 *lhs, const UInt256 *rhs) {
const __int128 *lv = (__int128 const *) lhs->value;
const __int128 *rv = (__int128 const *) rhs->value;
uint64_t l1, l2, r1, r2, ret;
asm("ldp %1, %2, %5\n\t"
"ldp %3, %4, %6\n\t"
"cmp %1, %3\n\t"
"ccmp %2, %4, 0, eq\n\t"
"ldp %1, %2, %7\n\t"
"ldp %3, %4, %8\r\n"
"ccmp %1, %3, 0, eq\n\t"
"ccmp %2, %4, 0, eq\n\t"
"cset %0, eq\n\t",
: "=r" (ret), "=r" (l1), "=r" (l2), "=r" (r1), "=r" (r2)
: "Ump" (lv[0]), "Ump" (rv[0]), "Ump" (lv[1]), "Ump" (rv[1])
: "cc")
return ret;
}
Ok, maybe it's a little unwieldy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With