Generate FMOV without inline assembly

Question

I want to:

Move a 64-bit value from GPR to the lower 64-bits of a vector register
Do an operation (specifically bdep or bext)
Move the lower 64-bits of my vector register to a GPR

This doesn't seem to be possible using ACLE intrinsics.

This is the closest I can get using intrinsics: https://godbolt.org/z/brjG6fe38

    const auto vec = svbdep_u64(svdup_n_u64(a), svdup_n_u64(b));
    return svlastb_u64(svptrue_b64(), vec);

which Clang compiles to

foo(unsigned long, unsigned long):
        mov     z0.d, x0
        ptrue   p0.d
        mov     z1.d, x1
        bdep    z0.d, z0.d, z1.d
        lastb   x0, p0, z0.d
        ret

The compiler is able to replace dup with mov, which is great. However, it still generates lastb, which is completely wasteful since I only need the last 64 bits. An fmov would do just fine.

Am I missing something, or is this basic operation not supported by ACLE intrinsics?

Peter Cordes · Accepted Answer

GNU C native vector syntax allows indexing a vector with [].

return vec[0] compiles to fmov. I don't know SVE very well, and I haven't checked how Clang's <arm_sve.h> defines the vector types.

This is unlikely to be portable to other compilers, especially MSVC.

uint64_t foo(uint64_t a, uint64_t b) {
    const auto vec = svbdep_u64(svdup_n_u64(a), svdup_n_u64(b));
    return vec[0];
}

Godbolt:

foo(unsigned long, unsigned long):
        mov     z0.d, x0
        mov     z1.d, x1
        bdep    z0.d, z0.d, z1.d
        fmov    x0, d0
        ret

Generate FMOV without inline assembly

Tags:

clang

micro-optimization

simd

arm64

sve

Elliot Gorokhovsky

1 Answers

Peter Cordes

Recent Activity

Donate For Us

Generate FMOV without inline assembly

Tags:

clang

micro-optimization

simd

arm64

sve

Elliot Gorokhovsky

1 Answers

Peter Cordes

Related questions

Recent Activity

Donate For Us