Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorizing a conditional involving shorts

I'm using a compact struct of 2 unsigned shorts indicating a start and end position.
I need to be able to quickly determine if there are any Range objects with a length (difference from start to end) past a threshold value.

I'm going to have a huge quantity of objects each with their own Range array, so it is not feasible to track which Range objects are above the threshold in a list or something. This code is also going to be run very often (many times a second for each array), so it needs to be efficient.

struct Range
{
 unsigned short start;
 unsigned short end;
}

I will always have an array of Range sized 2^n. While I would like to abort as soon as I find something over the threshold, I'm pretty sure it'd be faster to simply OR it all together and check at the end... assuming I can vectorize the loop. Although if I could do an if statement on the chunk of results for each vector, that would be grand.

size_t rangecount = 1 << resolution;
Range* ranges = new Range[rangecount];

...

bool result = false;
for (size_t i = 0; i < ranges; ++i)
{
 result |= (range[i].end - range[i].start) > 4;
}

Not surprisingly, the auto-vectorizer gives the 1202 error because my data type isn't 32 or 64 bits wide. I really don't want to double my data size and make each field an unsigned int. So I'm guessing the auto-vectorizer approach is out for this.

Are there vector instructions that can handle 16 bit variables? If there are, how could I use them in c++ to vectorize my loop?

like image 896
user173342 Avatar asked Dec 21 '12 16:12

user173342


People also ask

What does it mean to vectorize an operation?

Vectorization is the process of converting an algorithm from operating on a single value at a time to operating on a set of values at one time. Modern CPUs provide direct support for vector operations where a single instruction is applied to multiple data (SIMD).

How do you vectorize a variable in Python?

outer(a, b): Compute the outer product of two vectors. multiply(a, b): Matrix product of two arrays. dot(a, b): Dot product of two arrays. zeros((n, m)): Return a matrix of given shape and type, filled with zeros.

What is vectorization in C?

Vectorization means that the compiler detects that your independent instructions can be executed as one SIMD instruction. Usual example is that if you do something like for(i=0; i<N; i++){ a[i] = a[i] + b[i]; }

What is vectorizing programming?

Vectorization is the term for converting a scalar program to a vector program. Vectorized programs can run multiple operations from a single instruction, whereas scalar can only operate on pairs of operands at once.


1 Answers

You are wondering if any value is greater than 4?

Yes, there are SIMD instructions for this. It's unfortunate that the auto-vectorized isn't able to handle this scenario. Here's a vectorized algorithm:

diff_v = end_v - start_v; // _mm_hsub_epi16 
floor_v = max(4_v, diff_v); // _mm_max_epi16 
if (floor_v != 4_v) return true; // wide scalar comparison

Use _mm_sub_epi16 with a structure of arrays or _mm_hsub_epi16 with an array of structures.

Actually since start is stored first in memory, you will be working on start_v - end_v, so use _mm_min_epi16 and a vector of -4.

Each SSE3 instruction will perform 8 comparisons at a time. It will still be fastest to return early instead of looping. However, unrolling the loop a bit more may buy you additional speed (pass the first set of results into the packed min/max function to combine them).

So you end up with (approximately):

most_negative = threshold = _mm_set_epi64(0xFCFCFCFCFCFCFCFC); // vectorized -4

loop:
    a = load from range;
    b = load from range;
    diff = _mm_hsub_epi16(a, b);
    most_negative = _mm_min_epi16(most_negative, diff);

    // unroll by repeating the above four instructions 4 times or so
    if (most_negative != threshold) return true;
repeat loop
like image 103
Ben Voigt Avatar answered Oct 06 '22 07:10

Ben Voigt