Auto vectorization with Rust

Question

I am new with rust/SIMD and I have the following code snippet as the bottleneck of my program, I am wondering if I can leverage on the autovectorization feature of it

fn is_subset(a: Vec<i64>, b: Vec<i64>) -> bool {
    for i in 0..a.len() {
        if (a[i] & !b[i]) != 0 {
            return false;
        }
    }
    true
}

I also have an alternative way for writing this (with iterator so trip count will be known up front), would this create autovectorization instead?

fn is_subset(a: Vec<i64>, b: Vec<i64>) -> bool {
    return a.iter().zip(b.iter()).all(|(x, y)| x & y == *x)
}

Peter Cordes · Accepted Answer

LLVM (and GCC) don't know how to auto-vectorize loops whose trip-count can't be calculated up front. This rules out search loops like this.

ICC classic can auto-vectorize such loops, but it's a C/C++ compiler, without a front-end for Rust.

Probably your only hope would be to manually loop over 2, 4, or 8-element chunks of the arrays, branchlessly calculating your condition based on all those elements. If you're lucky, LLVM might turn that into operations on one SIMD vector. So using that inner loop inside a larger loop could result in getting the compiler to make vectorized asm, for example using AVX vptest (which sets CF according to bitwise a AND (not b) having any non-zero bits).

i.e. manually express the "unrolling" of SIMD elements in your source, for a specific vector width.

Related re: getting compilers to auto-vectorize with x86 ptest:

Getting GCC to generate a PTEST instruction when using vector extensions
how to auto vectorization array comparison function
GCC C vector extension: How to check if result of ANY element-wise comparison is true, and which?

If the whole array is small enough, a branchless reduction over the whole array (ORing boolean results together) could be something a compiler is willing to do something with. If compiling for x86, you'd be hoping for asm like pandn / por in the loop, and a horizontal reduction at the end, so see if any bits were set in the intersection of a and not b.

i64 is only 2 elements per 16-byte vector, so the compiler would have to see a good strategy for it to be profitable to auto-vectorize, especially on a 64-bit machine. Having 32-byte vectors available makes it more attractive.

Auto vectorization with Rust

Tags:

rust

simd

auto-vectorization

xxx222

1 Answers

Peter Cordes

Recent Activity

Donate For Us

Auto vectorization with Rust

Tags:

rust

simd

auto-vectorization

xxx222

1 Answers

Peter Cordes

Related questions

Recent Activity

Donate For Us