Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Obtaining remainder using single aarch64 instruction?

I am writing some assembly code for ARM8 (aarch64). I want to do a division and use the remainder obtained for further calculations. In x86 when I use 'div', and I know my remainder is kept in RDX. My question is - is there an equivalent to that in the aarch64 instruction set? I know 'udiv' and 'sdiv' do unsigned and signed divisions and gets me the quotient. Is there a single instruction which would give me remainder? (I want % modulo operator in c). I understand I can obtain it using algebra, just wanted to confirm I am not missing out on a simpler method.

like image 249
c_mnon Avatar asked Feb 11 '16 22:02

c_mnon


Video Answer


2 Answers

Barring constant power-of-two divisors which can be optimised down to an and, there is no instruction that will calculate the remainder of a division. You can, however do it pretty neatly in two:

// input: x0=dividend, x1=divisor
udiv x2, x0, x1
msub x3, x2, x1, x0
// result: x2=quotient, x3=remainder
like image 106
Notlikethat Avatar answered Sep 19 '22 00:09

Notlikethat


Calculating the remainder isn't a single instruction

The Clang C compiler produced the following code for a modulo calculation:

udiv    x10, x0, x9
msub    x10, x10, x9, x0

Good news, this isn't slow!

While the x86 does this in a single instruction, that doesn't make it faster.

On the Apple M-1, the above instruction pair is executed in about the same time as a single step. Possibly this is due to instruction macro-fusion that decodes multiple instructions into a single µ-op. It could also be due to parallelism in multiple execution units. Possibly, it is done in one EU where the remainder from the division calculation is cached and returned immediately.

Whatever the implementation, it seems to be about as fast as Intel's single instruction form.

Division-only

Timing:

$ time ./a.out 12345678901
Total: 301123495054
real    0m10.036s
user    0m9.668s
sys 0m0.031s

Instruction generated:

udiv    x10, x0, x9

Remainder-only

Timing:

$ time ./a.out 12345678901
Total: 8612082846779832640
real    0m10.190s
user    0m9.768s
sys 0m0.070s

Instructions generated:

udiv    x10, x0, x9
msub    x10, x10, x9, x0

Division and remainder

Timing:

$ time ./a.out 12345678901
Total: 8612083123211969892
real    0m10.103s
user    0m9.752s
sys 0m0.019s

Instructions generated:

udiv    x10, x0, x9
msub    x11, x10, x9, x0

Benchmark code

The following C code can be run with either q = n / d or r = n % d commented out:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    unsigned long long n, d, q=1, r=1, total=0;

    n = strtoull(argv[1], NULL, 10);
    total = 0;
    for (d=1 ; d<=n ; d++) {
        q = n / d;
        r = n % d;
        total += q + r;
    }
    printf("Total: %llu", total);
    return 0;
}
like image 45
Raymond Hettinger Avatar answered Sep 20 '22 00:09

Raymond Hettinger