When I try to store and load 256bits to and from an AVX2 256bit vector, I'm not receiving expected output in release mode.
use std::arch::x86_64::*;
fn main() {
let key = [1u64, 2, 3, 4];
let avxreg = unsafe { _mm256_load_si256(key.as_ptr() as *const __m256i) };
let mut back_key = [0u64; 4];
unsafe { _mm256_storeu_si256(back_key.as_mut_ptr() as *mut __m256i, avxreg) };
println!("back_key: {:?}", back_key);
}
playground
In debug mode:
back_key: [1, 2, 3, 4]
In release mode:
back_key: [1, 2, 0, 0]
The back half either isn't being loaded or stored and I can't figure out which.
What's weird is targeting a native CPU works. In release mode + RUSTFLAGS="-C target-cpu=native"
back_key: [1, 2, 3, 4]
I've even tried to rid myself of Clippy errors by forcing alignment to no avail (I'm not sure if the code below is even considered more correct).
use std::arch::x86_64::*;
#[repr(align(256))]
#[derive(Debug)]
struct Key([u64; 4]);
fn main() {
let key = Key([1u64, 2, 3, 4]);
let avxreg = unsafe { _mm256_load_si256(&key as *const _ as *const __m256i) };
let mut back_key = Key([0u64; 4]);
unsafe { _mm256_storeu_si256((&mut back_key) as *mut _ as *mut __m256i, avxreg) };
println!("back_key: {:?}", back_key);
}
After more thoroughly reading the docs, it became clear that I had to extract the body into another function and force that function to be compiled with AVX2 by annotating it with
#[target_feature(enable = "avx2")]
Or compile the entire program with
RUSTFLAGS="-C target-feature=+avx2" cargo run --release
The first option is better because it guarantees that the SIMD instructions used in a function are compiled appropriately, it's just on the caller to check their CPU has those capabilities before calling with is_x86_feature_detected!("avx2")
. All this is documented, but it would be amazing if the compiler could warn with "hey, this function uses AVX2 instructions, but was not annotated with #[target_feature(enable = "avx2")]
and the program was not compiled with AVX2 enabled globally, so calling this function is undefined behavior". It would have saved me a lot of headache!
Since relying on undefined behavior is bad, our initial program on the playground should be written as:
use std::arch::x86_64::*;
fn main() {
unsafe { run() }
}
#[target_feature(enable = "avx2")]
unsafe fn run() {
let key = [1u64, 2, 3, 4];
let avxreg = _mm256_load_si256(key.as_ptr() as *const __m256i);
let mut back_key = [0u64; 4];
_mm256_storeu_si256(back_key.as_mut_ptr() as *mut __m256i, avxreg);
println!("back_key: {:?}", back_key);
}
Some notes:
main
cannot be unsafe and thus can't be annotated with target_feature
, so it is necessary to extract into another functionx86_64
CPU running the code has avx
capabilities, so make sure you check before callingIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With