I was watching a talk, <a href="https://www.youtube.com/watch?v=fHNmRkzxHWs" rel="nofollow noreferrer">"Efficiency with Algorithms, Performance with Data Structures"</a>, and was surprised by the comment that in: <pre class="prettyprint"><code>#include <string> #include <unordered_map> #include <memory> struct Foo { int x; }; Foo* getFoo(std::string key, std::unordered_map<std::string, std::unique_ptr<Foo>> &cache) { if (cache[key]) return cache[key].get(); cache[key] = std::unique_ptr<Foo>(new Foo()); return cache[key].get(); } Foo* getFooBetter(std::string key, std::unordered_map<std::string, std::unique_ptr<Foo>> &cache) { std::unique_ptr<Foo> &entry = cache[key]; if (entry) return entry.get(); entry = std::unique_ptr<Foo>(new Foo()); return entry.get(); } </code></pre> <code>getFooBetter()</code> is better. I had been under the belief that I could rely on the compiler for performing this sort of transformation in the same way that I'd expect multiple occurrences of <code>x+y</code> to be evaluated only once. Unsurprisingly, the generated LLVM IR indeed agrees with the presenter. Even with -O9, we're left with 3 calls to <code>cache[key]</code> in the <code>getFoo()</code> version. I've moved the long LLVM IR of both with c++ symbols unmangled out of line so as not to be visually offensive. Another StackOverflow question reveals that part of the answer here is that <code>operator[]</code> is assumed to be able to modify whatever global state it wishes, and thus we can't elide calls. A linked proposal about introducing a <code>[[pure]]</code> annotation talks about its applications to CSE. If we stayed at 4 calls, I'd be able to end here feeling satisfied. However, if my reading of the IR is correct, it looks like we optimized <code>getFoo()</code> into as if we wrote: <pre class="prettyprint"><code>Foo* getFoo(std::string key, std::unordered_map<std::string, std::unique_ptr<Foo>> &cache) { if (cache[key]) return cache[key].get(); std::unique_ptr<Foo> &entry = cache[key]; entry = std::unique_ptr<Foo>(new Foo()); return entry.get(); } </code></pre> Would someone be able to explain what clang's view of the code is such that it was able to merge the two last <code>cache[key]</code>s, but not all of them? (My local clang is 3.4.)

CSE implementation in llvm is operating on Arithmatic Expressions. You can see llvm Common Subexpression Elimination source code in llvm/lib/Transforms/Scalar/EarlyCSE.cpp Case we face here is Interprocedural Optimizations. This call <code>cache[key]</code> turn out to be <code>[](cache,key)</code> function. so optimizations such as inlining may come into action depending upon cost of inlining of [] function. Chandler was mentioning same issue, given hash function is computationally expensive, inlining is prevented, one ends up computing hash function more than once! In case inlining happened, IR at -O3, <code>cache[key]</code> was computed first and given <code>cache</code> <code>key</code> has not mutated at all such call will be optimized to same SSA value. In case of <code>cache[key].get()</code>, we would normally write IR as cache[key] returns object and getting value of field using getelementpointer inside <code>get()</code> . With optimization turned on this IR turn out to be our previously calculated SSA value for 'cache[key]` with element accessing from struct of unique pointer. Coming Back to <code>getFooBetter()</code> in worst case if compiler fails to do optimizations across procedures, more occurrences of <code>cache[key]</code> will result in more computation and this call even at O3 will appear as it is!

Limitations of Common Subexpression Elimination in C++

Q: Which of the following is an example of common subexpression elimination?

(D) x = 4 &lowast; 5 => x = 20 is an example of common subexpression elimination.

I was watching a talk, "Efficiency with Algorithms, Performance with Data Structures", and was surprised by the comment that in:

#include <string>
#include <unordered_map>
#include <memory>

struct Foo {
  int x;
};

Foo* getFoo(std::string key,
            std::unordered_map<std::string,
                               std::unique_ptr<Foo>> &cache) {
  if (cache[key])
    return cache[key].get();

  cache[key] = std::unique_ptr<Foo>(new Foo());
  return cache[key].get();
}

Foo* getFooBetter(std::string key,
                  std::unordered_map<std::string,
                                     std::unique_ptr<Foo>> &cache) {
  std::unique_ptr<Foo> &entry = cache[key];
  if (entry)
    return entry.get();

  entry = std::unique_ptr<Foo>(new Foo());
  return entry.get();
}

getFooBetter() is better. I had been under the belief that I could rely on the compiler for performing this sort of transformation in the same way that I'd expect multiple occurrences of x+y to be evaluated only once. Unsurprisingly, the generated LLVM IR indeed agrees with the presenter. Even with -O9, we're left with 3 calls to cache[key] in the getFoo() version.

I've moved the long LLVM IR of both with c++ symbols unmangled out of line so as not to be visually offensive.

Another StackOverflow question reveals that part of the answer here is that operator[] is assumed to be able to modify whatever global state it wishes, and thus we can't elide calls. A linked proposal about introducing a [[pure]] annotation talks about its applications to CSE.

If we stayed at 4 calls, I'd be able to end here feeling satisfied. However, if my reading of the IR is correct, it looks like we optimized getFoo() into as if we wrote:

Foo* getFoo(std::string key,
            std::unordered_map<std::string,
                               std::unique_ptr<Foo>> &cache) {
  if (cache[key])
    return cache[key].get();

  std::unique_ptr<Foo> &entry = cache[key];
  entry = std::unique_ptr<Foo>(new Foo());
  return entry.get();
}

Would someone be able to explain what clang's view of the code is such that it was able to merge the two last cache[key]s, but not all of them? (My local clang is 3.4.)

What is common subexpression and how do you eliminate it?

In compiler theory, common subexpression elimination (CSE) is a compiler optimization that searches for instances of identical expressions (i.e., they all evaluate to the same value), and analyzes whether it is worthwhile replacing them with a single variable holding the computed value.

Which of the following is an example of common subexpression elimination?

(D) x = 4 ∗ 5 => x = 20 is an example of common subexpression elimination.

How do you implement common subexpression elimination?

To implement common subexpression elimination we traverse the program, look- ing for definitions l : x ← s1⊙s2. If s1⊙s2 is already in the table, defining variable y at k, we replace l with l : x ← y if k dominates l. Otherwise, we add the expression, line, and variable to the hash table.

What identifies the common subexpression in the expression?

The expression or sub-expression that has been appeared and computed before and appears again during the computation of the code is the common sub-expression. Elimination of that sub-expression is known as Common sub-expression elimination.

CSE implementation in llvm is operating on Arithmatic Expressions. You can see llvm Common Subexpression Elimination source code in llvm/lib/Transforms/Scalar/EarlyCSE.cpp

Case we face here is Interprocedural Optimizations.

This call cache[key] turn out to be [](cache,key) function. so optimizations such as inlining may come into action depending upon cost of inlining of [] function. Chandler was mentioning same issue, given hash function is computationally expensive, inlining is prevented, one ends up computing hash function more than once!

In case inlining happened, IR at -O3, cache[key] was computed first and given cache key has not mutated at all such call will be optimized to same SSA value.

In case of cache[key].get(), we would normally write IR as cache[key] returns object and getting value of field using getelementpointer inside get() . With optimization turned on this IR turn out to be our previously calculated SSA value for 'cache[key]` with element accessing from struct of unique pointer.

Coming Back to getFooBetter() in worst case if compiler fails to do optimizations across procedures, more occurrences of cache[key] will result in more computation and this call even at O3 will appear as it is!

Limitations of Common Subexpression Elimination in C++

Tags:

c++

compiler-optimization

clang

Alex Miller

People also ask

1 Answers

Mahesh Attarde

Recent Activity

Donate For Us

Limitations of Common Subexpression Elimination in C++

Tags:

c++

compiler-optimization

clang

Alex Miller

People also ask

1 Answers

Mahesh Attarde

Related questions

Recent Activity

Donate For Us