I want to:
bdep or bext)This doesn't seem to be possible using ACLE intrinsics.
This is the closest I can get using intrinsics: https://godbolt.org/z/brjG6fe38
const auto vec = svbdep_u64(svdup_n_u64(a), svdup_n_u64(b));
return svlastb_u64(svptrue_b64(), vec);
which Clang compiles to
foo(unsigned long, unsigned long):
mov z0.d, x0
ptrue p0.d
mov z1.d, x1
bdep z0.d, z0.d, z1.d
lastb x0, p0, z0.d
ret
The compiler is able to replace dup with mov, which is great. However, it still generates lastb, which is completely wasteful since I only need the last 64 bits. An fmov would do just fine.
Am I missing something, or is this basic operation not supported by ACLE intrinsics?
GNU C native vector syntax allows indexing a vector with [].
return vec[0] compiles to fmov. I don't know SVE very well, and I haven't checked how Clang's <arm_sve.h> defines the vector types.
This is unlikely to be portable to other compilers, especially MSVC.
uint64_t foo(uint64_t a, uint64_t b) {
const auto vec = svbdep_u64(svdup_n_u64(a), svdup_n_u64(b));
return vec[0];
}
Godbolt:
foo(unsigned long, unsigned long):
mov z0.d, x0
mov z1.d, x1
bdep z0.d, z0.d, z1.d
fmov x0, d0
ret
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With