Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the 'simplified' code not vectorized

I implemented the following code which converts 32 bytes of input to uppercase:

Version 1:

void to_upper(char* input) {
    for (int i = 0; i < 32; ++i) { 
        input[i] = (input[i] >= 'a' && input[i] <= 'z') ? input[i] - 32 : input[i]; 
    }
}

Version 2:

void to_upper(char* input) {
    for (int i = 0; i < 32; ++i) { 
        if (input[i] >= 'a' && input[i] <= 'z') {
            input[i] = input[i] - 32; // same for: input[i] -= 32;
        }
    }
}

The first version gets autovectorized, the second doesn't. The behaviour is consistent across clang and gcc. Moreover, I implemented both versions in Rust, too, and the Rust compiler does the same, Version 1 autovectorized, Version 2 isn't.

What is limiting the compiler from vectorizing the second version?

like image 483
David Frank Avatar asked Jul 20 '20 21:07

David Frank


People also ask

What is vectorization in C?

Vectorization means that the compiler detects that your independent instructions can be executed as one SIMD instruction. Usual example is that if you do something like for(i=0; i<N; i++){ a[i] = a[i] + b[i]; }

What is “vectorized code?

Performing the exact same operation on every entry in a structure (but with different values) is the essence of “vectorized code.” When we wrote a for -loop in xlin_fits_R () to perform the same steps for each index, we were in fact fighting the language.

What are the advantages of vectorization?

Vectorized code has many advantages, among which are: the code more closely resembles standard mathematical notation (making it easier, typically, to correctly code mathematical constructs) vectorization results in more “Pythonic” code. Without vectorization, our code would be littered with inefficient and difficult to read for loops.

What is the syntax in the vectorization approach?

The syntax in the vectorization approach when the complier see that, it will translate it into optimized CPU instructions that multiplies vectors. Like SIMD. @mskw: That's pseudo-code, not actual syntax for a C vector extension.

What is vectorization in C++?

Vectorization is the term for converting a scalar program to a vector program. Vectorized programs can run multiple operations from a single instruction, whereas scalar can only operate on pairs of operands at once. isn't that in essence same as Scalar approach?


1 Answers

Basically optimization passes have harder time recognizing the conditional assignment in second loop.

In first case GCC generates a slightly different intermediate representation which allows if-conversion pass to convert code to vectorizable form:

  <bb 3>:
  # i_18 = PHI <i_14(4), 0(2)>
  # ivtmp_24 = PHI <ivtmp_21(4), 32(2)>
  _6 = (sizetype) i_18;
  _7 = input_5(D) + _6;
  _8 = *_7;
  _9 = (unsigned char) _8;
  _10 = _9 + 159;
  _11 = _9 + 224;
  iftmp.0_12 = (char) _11;
  iftmp.0_2 = _10 <= 25 ? iftmp.0_12 : _8;
  *_7 = iftmp.0_2;
  i_14 = i_18 + 1;
  ivtmp_21 = ivtmp_24 - 1;
  if (ivtmp_21 != 0)
    goto <bb 4>;
  else
    goto <bb 5>;

whereas in second case code contains spurious jumps which complicate analysis and break vectorization:

  <bb 3>:
  # i_15 = PHI <i_14(6), 0(2)>
  # ivtmp_26 = PHI <ivtmp_25(6), 32(2)>
  _5 = (sizetype) i_15;
  _7 = input_6(D) + _5;
  _8 = *_7;
  _9 = (unsigned char) _8;
  _10 = _9 + 159;
  if (_10 <= 25)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  _11 = _9 + 224;
  _12 = (char) _11;
  *_7 = _12;

  <bb 5>:
  i_14 = i_15 + 1;
  ivtmp_25 = ivtmp_26 - 1;
  if (ivtmp_25 != 0)
    goto <bb 6>;
  else
    goto <bb 7>;

Many optimization passes work as pattern matchers which recognize and optimize common cases so I wouldn't be surprised with this behavior.

You can try filing a bug in GCC tracker.

like image 84
yugr Avatar answered Oct 12 '22 23:10

yugr