Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assembler on 64-bit iOS (A64)

I'm trying to replace certain methods with asm-implementations. Target is arm64 on iOS (iPhone 5S or newer). I want to use a dedicated assembler-file, as the inline assembler comes with additional overhead, and is quite cumbersome to use with A64 memory offsets.

There is not too much documentation on this on the Internet, so I'm kind of unsure if how I do it is the way to go. Therefore, I'll describe the process I followed to move a function to ASM.


The candidate function for this question is a 256-bit integer comparison function.

UInt256.h

@import Foundation;

typedef struct {
    uint64_t value[4];
} UInt256;

bool eq256(const UInt256 *lhs, const UInt256 *rhs);

Bridging-Header.h

#import "UInt256.h"

Reference implementation (Swift)

let result = x.value.0 == y.value.0
          && x.value.1 == y.value.1
          && x.value.2 == y.value.2
          && x.value.3 == y.value.3

UInt256.s

.globl _eq256
.align 2
_eq256:
    ldp        x9, x10, [x0]
    ldp       x11, x12, [x1]
    cmp        x9, x11
    ccmp      x10, x12, 0, eq
    ldp        x9, x10, [x0, 16]
    ldp       x11, x12, [x1, 16]
    ccmp       x9, x11, 0, eq
    ccmp      x10, x12, 0, eq
    cset       x0, eq
    ret

Resources I found

  • Section 5.1.1 of the Procedure Call Standard for the ARM 64-bit Architecture (AArch64) document explains the purpose of each register during procedure calls.

  • iOS specific deviations.

  • iOS Assembler Directives.


Questions

I've tested the code using XCTest, creating two random numbers, running both the Swift and the Asm implementations on them and verifying that both report the same result. The code seems to be correct.

  1. In the asm file: The .align seems to be for optimization - is this really necessary, and if yes, what is the correct value to align to?

  2. Is there any source that clearly explains how the calling convention for my specific function signature is?

    a. How can I know that the inputs are actually passed via x0 and x1?

    b. How can I know that it is correct to pass the output in x0?

    c. How can I know that it is safe to clobber x9-x12 and the status registers?

    d. Is the function called the same way when I call it from C instead of Swift?

  3. What does "Indirect result location register" mean for the r8 register description in the ARM document?

  4. Do I need any other assembler directives besides .globl?

  5. When I set breakpoints, the debugger seems to get confused where it actually is, showing incorrect lines etc. Am I doing something wrong?

like image 581
Etan Avatar asked Jun 19 '15 21:06

Etan


1 Answers

  1. The .align 2 directive is required for program correctness. A64 instructions need to be aligned on 32-bit boundaries.
  2. The documentation you linked seems clear to me and unfortunately this isn't the place to ask for recommendations.
    • You can determine that registers lhs and rhs get stored in X0 and X1 by by following the instructions given in section 5.4.2 (Parameter Passing Rules) of the Procedure Call Standard for the ARM 64-bit Architecture (AArch64) document you linked. Since the parameters are both pointers the only specific rule that applies is C.7.
    • You can determine which register is used to return values in by following the instructions given section 5.5 (Result Return). This just has you following the same rules as for parameters. Since the function returns an integer only rule C.7 applies and so the value is returned in X0.
    • It's safe to change the values stored in registers X9 through X12 because they're listed as temporary registers in the table given in section 5.1.1 (General-purpose Registers)
    • The question is really whether the function is called the same way in Swift as in C. Both the Procedure Call Standard document and the Apple specific exceptions document you linked are defined in terms of C and C++. Presumably Swift follows the same conventions but I don't know if Apple has made that explicit anywhere.
  3. The purpose of R8 is described in section 5.5 (Result Return). It's used when the return value is too big to fit into the registers used to return values. In that case the caller creates a buffer for the return value and puts it address in R8. The function then copies the return value in to this register.
  4. I don't believe you need anything else in your example assembly program.
  5. You've asked too many questions. You should post a separate and more detailed question describing your problem.

I should say one advantage of writing your code using inline assembly is that you wouldn't have to worry about any of this. Something like the following untested C code shouldn't be too unwieldy:

bool eq256(const UInt256 *lhs, const UInt256 *rhs) {
     const __int128 *lv = (__int128 const *) lhs->value;
     const __int128 *rv = (__int128 const *) rhs->value;

     uint64_t l1, l2, r1, r2, ret;

     asm("ldp       %1, %2, %5\n\t"
         "ldp       %3, %4, %6\n\t"
         "cmp       %1, %3\n\t"
         "ccmp      %2, %4, 0, eq\n\t"
         "ldp       %1, %2, %7\n\t"
         "ldp       %3, %4, %8\r\n"
         "ccmp      %1, %3, 0, eq\n\t"
         "ccmp      %2, %4, 0, eq\n\t"
         "cset      %0, eq\n\t",
         : "=r" (ret), "=r" (l1), "=r" (l2), "=r" (r1), "=r" (r2)
         : "Ump" (lv[0]), "Ump" (rv[0]), "Ump" (lv[1]), "Ump" (rv[1])
         : "cc")

     return ret;
}

Ok, maybe it's a little unwieldy.

like image 116
Ross Ridge Avatar answered Sep 30 '22 19:09

Ross Ridge