Strange performance observed with memoized function

Question

I was toying around with something that uses Euclid's algorithm to computed the GCD of two numbers. I implemented the standard one-liner as usual, and it worked fine. It's used in a algorithm that computes a series and calls gcd() several times per element as n gets larger. I decided to see if I could do better by memoizing, so here is what I tried:

size_t const gcd(size_t const a, size_t const b) {
  return b == 0 ? a : gcd(b, a % b);
}

struct memoized_gcd : private std::unordered_map<unsigned long long, size_t> {
  size_t const operator()(size_t const a, size_t const b) {
    unsigned long long const key = (static_cast<unsigned long long>(a) << 32) | b;
    if (find(key) == end()) (*this)[key] = b == 0 ? a : (*this)(b, a % b);
    return (*this)[key];
  }
};

//std::function<size_t (size_t, size_t)> gcd_impl = gcd<size_t,size_t>;
std::function<size_t (size_t, size_t)> gcd_impl = memoized_gcd();

I call the chosen function through the std::function instance later. Interestingly, when for example n = 10,000, the calculation runs in 8 sec on this computer, and with the memoized version it's close to a minute, everything else being equal.

Have I missed something obvious? I am using key as an expedient so that I don't need to specialize std::hash for the hash map. The only things I can think of are maybe that the memoized version doesn't get the TCO and gcd() does, or that calling through the std::function is slow for the functor (even though I use it for both), or perhaps that I'm retarded. Gurus, show me the way.

Notes

I've tried this on win32 and win64 with g++ 4.7.0 and linux x86 with g++ 4.6.1 and 4.7.1.

I also tried a version with a std::map<std::pair<size_t, size_t>, size_t> that had comparable performance to the unmemoized version.

Igor ostrovsky · Accepted Answer

The main issue with your version of GCD is that it may use up huge amounts of memory, depending on the usage pattern.

For example, if you compute GCD(a,b) for all pairs 0 <= a < 10,000, 0 <= b < 10,000, the memoization table will end up with 100,000,000 entries. Since on x86 each entry is 12 bytes, the hash table will take up at least 1.2 GB of memory. Working with that amount of memory is going to be slow.

And of course, if you evaluate GCD with values >=10,000, you can make the table arbitrarily large... at least until you run out of the address space or the commit limit.

Summary: In general, memoizing GCD is a bad idea because it leads to unbounded memory usage.

There are some finer points that could be discussed:

As the table exceeds various sizes, it will be stored in slower and slower memory: first L1 cache, then L2 cache, L3 cache (if present), physical memory, disk. Obviously the cost of the memoization increases dramatically as the table grows.
If you know that all inputs are in a small range (e.g., 0 <= x < 100), using memoization or a precomputed table could still be an optimization. Hard to be sure - you'd have to measure in your particular scenario.
There are potentially other ways of optimizing GCD. E.g., I am not sure whether g++ automatically recognizes tail recursion in this example. If not, you could get a performance boost by rewriting the recursion into a loop.

But as I said, it is not surprising at all that the algorithm you posted does not perform well.

Strange performance observed with memoized function

Tags:

c++

performance

optimization

c++11

Keith Layne

1 Answers

Igor ostrovsky

Recent Activity

Donate For Us

Strange performance observed with memoized function

Tags:

c++

performance

optimization

c++11

Keith Layne

1 Answers

Igor ostrovsky

Related questions

Recent Activity

Donate For Us