Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limitations of Common Subexpression Elimination in C++

I was watching a talk, "Efficiency with Algorithms, Performance with Data Structures", and was surprised by the comment that in:

#include <string>
#include <unordered_map>
#include <memory>

struct Foo {
  int x;
};

Foo* getFoo(std::string key,
            std::unordered_map<std::string,
                               std::unique_ptr<Foo>> &cache) {
  if (cache[key])
    return cache[key].get();

  cache[key] = std::unique_ptr<Foo>(new Foo());
  return cache[key].get();
}

Foo* getFooBetter(std::string key,
                  std::unordered_map<std::string,
                                     std::unique_ptr<Foo>> &cache) {
  std::unique_ptr<Foo> &entry = cache[key];
  if (entry)
    return entry.get();

  entry = std::unique_ptr<Foo>(new Foo());
  return entry.get();
}

getFooBetter() is better. I had been under the belief that I could rely on the compiler for performing this sort of transformation in the same way that I'd expect multiple occurrences of x+y to be evaluated only once. Unsurprisingly, the generated LLVM IR indeed agrees with the presenter. Even with -O9, we're left with 3 calls to cache[key] in the getFoo() version.

I've moved the long LLVM IR of both with c++ symbols unmangled out of line so as not to be visually offensive.

Another StackOverflow question reveals that part of the answer here is that operator[] is assumed to be able to modify whatever global state it wishes, and thus we can't elide calls. A linked proposal about introducing a [[pure]] annotation talks about its applications to CSE.

If we stayed at 4 calls, I'd be able to end here feeling satisfied. However, if my reading of the IR is correct, it looks like we optimized getFoo() into as if we wrote:

Foo* getFoo(std::string key,
            std::unordered_map<std::string,
                               std::unique_ptr<Foo>> &cache) {
  if (cache[key])
    return cache[key].get();

  std::unique_ptr<Foo> &entry = cache[key];
  entry = std::unique_ptr<Foo>(new Foo());
  return entry.get();
}

Would someone be able to explain what clang's view of the code is such that it was able to merge the two last cache[key]s, but not all of them? (My local clang is 3.4.)

like image 521
Alex Miller Avatar asked May 29 '15 04:05

Alex Miller


People also ask

What is common subexpression and how do you eliminate it?

In compiler theory, common subexpression elimination (CSE) is a compiler optimization that searches for instances of identical expressions (i.e., they all evaluate to the same value), and analyzes whether it is worthwhile replacing them with a single variable holding the computed value.

Which of the following is an example of common subexpression elimination?

(D) x = 4 ∗ 5 => x = 20 is an example of common subexpression elimination.

How do you implement common subexpression elimination?

To implement common subexpression elimination we traverse the program, look- ing for definitions l : x ← s1⊙s2. If s1⊙s2 is already in the table, defining variable y at k, we replace l with l : x ← y if k dominates l. Otherwise, we add the expression, line, and variable to the hash table.

What identifies the common subexpression in the expression?

The expression or sub-expression that has been appeared and computed before and appears again during the computation of the code is the common sub-expression. Elimination of that sub-expression is known as Common sub-expression elimination.


1 Answers

CSE implementation in llvm is operating on Arithmatic Expressions. You can see llvm Common Subexpression Elimination source code in llvm/lib/Transforms/Scalar/EarlyCSE.cpp

Case we face here is Interprocedural Optimizations.

This call cache[key] turn out to be [](cache,key) function. so optimizations such as inlining may come into action depending upon cost of inlining of [] function. Chandler was mentioning same issue, given hash function is computationally expensive, inlining is prevented, one ends up computing hash function more than once!

In case inlining happened, IR at -O3, cache[key] was computed first and given cache key has not mutated at all such call will be optimized to same SSA value.

In case of cache[key].get(), we would normally write IR as cache[key] returns object and getting value of field using getelementpointer inside get() . With optimization turned on this IR turn out to be our previously calculated SSA value for 'cache[key]` with element accessing from struct of unique pointer.

Coming Back to getFooBetter() in worst case if compiler fails to do optimizations across procedures, more occurrences of cache[key] will result in more computation and this call even at O3 will appear as it is!

like image 186
Mahesh Attarde Avatar answered Oct 12 '22 19:10

Mahesh Attarde