Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate FMOV without inline assembly

I want to:

  • Move a 64-bit value from GPR to the lower 64-bits of a vector register
  • Do an operation (specifically bdep or bext)
  • Move the lower 64-bits of my vector register to a GPR

This doesn't seem to be possible using ACLE intrinsics.

This is the closest I can get using intrinsics: https://godbolt.org/z/brjG6fe38

    const auto vec = svbdep_u64(svdup_n_u64(a), svdup_n_u64(b));
    return svlastb_u64(svptrue_b64(), vec);

which Clang compiles to

foo(unsigned long, unsigned long):
        mov     z0.d, x0
        ptrue   p0.d
        mov     z1.d, x1
        bdep    z0.d, z0.d, z1.d
        lastb   x0, p0, z0.d
        ret

The compiler is able to replace dup with mov, which is great. However, it still generates lastb, which is completely wasteful since I only need the last 64 bits. An fmov would do just fine.

Am I missing something, or is this basic operation not supported by ACLE intrinsics?

like image 994
Elliot Gorokhovsky Avatar asked Feb 21 '26 01:02

Elliot Gorokhovsky


1 Answers

GNU C native vector syntax allows indexing a vector with [].

return vec[0] compiles to fmov. I don't know SVE very well, and I haven't checked how Clang's <arm_sve.h> defines the vector types.

This is unlikely to be portable to other compilers, especially MSVC.

uint64_t foo(uint64_t a, uint64_t b) {
    const auto vec = svbdep_u64(svdup_n_u64(a), svdup_n_u64(b));
    return vec[0];
}

Godbolt:

foo(unsigned long, unsigned long):
        mov     z0.d, x0
        mov     z1.d, x1
        bdep    z0.d, z0.d, z1.d
        fmov    x0, d0
        ret
like image 76
Peter Cordes Avatar answered Feb 24 '26 04:02

Peter Cordes