I implemented the following code which converts 32 bytes of input to uppercase: Version 1: <pre class="prettyprint"><code>void to_upper(char* input) { for (int i = 0; i < 32; ++i) { input[i] = (input[i] >= 'a' && input[i] <= 'z') ? input[i] - 32 : input[i]; } } </code></pre> Version 2: <pre class="prettyprint"><code>void to_upper(char* input) { for (int i = 0; i < 32; ++i) { if (input[i] >= 'a' && input[i] <= 'z') { input[i] = input[i] - 32; // same for: input[i] -= 32; } } } </code></pre> The first version gets autovectorized, the second doesn't. The behaviour is consistent across clang and gcc. Moreover, I implemented both versions in Rust, too, and the Rust compiler does the same, Version 1 autovectorized, Version 2 isn't. What is limiting the compiler from vectorizing the second version?

Basically optimization passes have harder time recognizing the conditional assignment in second loop. In first case GCC generates a slightly different intermediate representation which allows if-conversion pass to convert code to vectorizable form: <pre class="prettyprint"><code> <bb 3>: # i_18 = PHI <i_14(4), 0(2)> # ivtmp_24 = PHI <ivtmp_21(4), 32(2)> _6 = (sizetype) i_18; _7 = input_5(D) + _6; _8 = *_7; _9 = (unsigned char) _8; _10 = _9 + 159; _11 = _9 + 224; iftmp.0_12 = (char) _11; iftmp.0_2 = _10 <= 25 ? iftmp.0_12 : _8; *_7 = iftmp.0_2; i_14 = i_18 + 1; ivtmp_21 = ivtmp_24 - 1; if (ivtmp_21 != 0) goto <bb 4>; else goto <bb 5>; </code></pre> whereas in second case code contains spurious jumps which complicate analysis and break vectorization: <pre class="prettyprint"><code> <bb 3>: # i_15 = PHI <i_14(6), 0(2)> # ivtmp_26 = PHI <ivtmp_25(6), 32(2)> _5 = (sizetype) i_15; _7 = input_6(D) + _5; _8 = *_7; _9 = (unsigned char) _8; _10 = _9 + 159; if (_10 <= 25) goto <bb 4>; else goto <bb 5>; <bb 4>: _11 = _9 + 224; _12 = (char) _11; *_7 = _12; <bb 5>: i_14 = i_15 + 1; ivtmp_25 = ivtmp_26 - 1; if (ivtmp_25 != 0) goto <bb 6>; else goto <bb 7>; </code></pre> Many optimization passes work as pattern matchers which recognize and optimize common cases so I wouldn't be surprised with this behavior. You can try filing a bug in GCC tracker.

Why is the 'simplified' code not vectorized

Q: What is “vectorized code?

Performing the exact same operation on every entry in a structure (but with different values) is the essence of “vectorized code.” When we wrote a for -loop in xlin_fits_R () to perform the same steps for each index, we were in fact fighting the language.

Q: What are the advantages of vectorization?

Vectorized code has many advantages, among which are: the code more closely resembles standard mathematical notation (making it easier, typically, to correctly code mathematical constructs) vectorization results in more “Pythonic” code. Without vectorization, our code would be littered with inefficient and difficult to read for loops.

Q: What is the syntax in the vectorization approach?

The syntax in the vectorization approach when the complier see that, it will translate it into optimized CPU instructions that multiplies vectors. Like SIMD. @mskw: That's pseudo-code, not actual syntax for a C vector extension.

Q: What is vectorization in C++?

Vectorization is the term for converting a scalar program to a vector program. Vectorized programs can run multiple operations from a single instruction, whereas scalar can only operate on pairs of operands at once. isn't that in essence same as Scalar approach?

Tags:

c++

vectorization

gcc

rust

clang

I implemented the following code which converts 32 bytes of input to uppercase:

Version 1:

void to_upper(char* input) {
    for (int i = 0; i < 32; ++i) { 
        input[i] = (input[i] >= 'a' && input[i] <= 'z') ? input[i] - 32 : input[i]; 
    }
}

Version 2:

void to_upper(char* input) {
    for (int i = 0; i < 32; ++i) { 
        if (input[i] >= 'a' && input[i] <= 'z') {
            input[i] = input[i] - 32; // same for: input[i] -= 32;
        }
    }
}

The first version gets autovectorized, the second doesn't. The behaviour is consistent across clang and gcc. Moreover, I implemented both versions in Rust, too, and the Rust compiler does the same, Version 1 autovectorized, Version 2 isn't.

What is limiting the compiler from vectorizing the second version?

483

asked Jul 20 '20 21:07

David Frank

1 Answers

Basically optimization passes have harder time recognizing the conditional assignment in second loop.

In first case GCC generates a slightly different intermediate representation which allows if-conversion pass to convert code to vectorizable form:

  <bb 3>:
  # i_18 = PHI <i_14(4), 0(2)>
  # ivtmp_24 = PHI <ivtmp_21(4), 32(2)>
  _6 = (sizetype) i_18;
  _7 = input_5(D) + _6;
  _8 = *_7;
  _9 = (unsigned char) _8;
  _10 = _9 + 159;
  _11 = _9 + 224;
  iftmp.0_12 = (char) _11;
  iftmp.0_2 = _10 <= 25 ? iftmp.0_12 : _8;
  *_7 = iftmp.0_2;
  i_14 = i_18 + 1;
  ivtmp_21 = ivtmp_24 - 1;
  if (ivtmp_21 != 0)
    goto <bb 4>;
  else
    goto <bb 5>;

whereas in second case code contains spurious jumps which complicate analysis and break vectorization:

  <bb 3>:
  # i_15 = PHI <i_14(6), 0(2)>
  # ivtmp_26 = PHI <ivtmp_25(6), 32(2)>
  _5 = (sizetype) i_15;
  _7 = input_6(D) + _5;
  _8 = *_7;
  _9 = (unsigned char) _8;
  _10 = _9 + 159;
  if (_10 <= 25)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  _11 = _9 + 224;
  _12 = (char) _11;
  *_7 = _12;

  <bb 5>:
  i_14 = i_15 + 1;
  ivtmp_25 = ivtmp_26 - 1;
  if (ivtmp_25 != 0)
    goto <bb 6>;
  else
    goto <bb 7>;

Many optimization passes work as pattern matchers which recognize and optimize common cases so I wouldn't be surprised with this behavior.

You can try filing a bug in GCC tracker.

answered Oct 12 '22 23:10

yugr

Related questions
                            
                                How to make clang -E omit precompiled header
                            
                                Is it permissible for standard library implementation to have class definition that is different from the C++ standard?
                            
                                ccache cache miss slow down compilation a lot
                            
                                Why shouldn't C++ operator new/delete/variants be in header files?
                            
                                Why does gcc reject `new v < new v` when `v` is not a template?
                            
                                Implicit conversion of both operands for operator==
                            
                                Why can't one compare a function pointer to a template function without explicit & on function name?
                            
                                Brace initialization prevents non-const use of temporary
                            
                                C++ When are characters widened in output stream operator<<()?
                            
                                What is “exception vomiting”?
                            
                                Visual Studio Code use input text file on debug
                            
                                compile-time validation of constexpr std::initializer_list
                            
                                Does C++17 forbid copy elision in a case where C++14 allowed it?
                            
                                Difference between parsing of void() and int()
                            
                                Ambiguous partial specializations with Clang in C++17
                            
                                How to force east const in clang format?
                            
                                Is moving an object into malloc'd memory valid C++?
                            
                                How to format floating point numbers with decimal comma using the fmt library?
                            
                                Non-type template parameter type changes randomly
                            
                                Passing a concept to a function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With