Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is std::set slower than std::map?

I was trying to solve this problem from acm.timus.ru which basically wants me to output the number of different substrings of a given string (max length 5000).

The solutions I am about to present are desperately inefficient and doomed for Time Limit Exceeded verdict given the constraints. However, the only way in which these two solutions differ (at least as far as I can see/understand) is that one uses std::map<long long, bool>, while the other uses std::set <long long> (see the beginning of the last for loop. The rest is identical, you can check by any diff tool). The map solution results in "Time Limit Exceeded on Test 3", whereas the set solution results in "Time Limit Exceeded on Test 2", which means that Test 2 is such that the map solution works faster on it than the set solution. This is the case if I choose Microsoft Visual Studio 2010 compiler. If I choose GCC, then both solutions result in TLE on test 3.

I am not asking for how to solve the problem efficiently. What I am asking is how can one explain that using std::map can apparently be more efficient than using std::set. I just fail to see the mechanics of this phenomenon and hope that someone could have any insights.

Code1 (uses map, TLE 3):

#include <iostream>
#include <map>
#include <string>
#include <vector>

using namespace std;

int main ()
{
   string s;
   cin >> s;
   vector <long long> p;
   p.push_back(1);
   for (int i = 1; i < s.size(); i++)
      p.push_back(31 * p[i - 1]);
   vector <long long> hash_temp;
   hash_temp.push_back((s[0] - 'a' + 1) * p[0]);
   for (int i = 1; i < s.size(); i++)
      hash_temp.push_back((s[i] - 'a' + 1) * p[i] + hash_temp[i - 1]);   
   int n = s.size();   
   int answer = 0;
   for (int i = 1; i <= n; i++)
   {
      map <long long, bool> hash_ans;
      for (int j = 0; j < n - i + 1; j++)
      {
         if (j == 0)
            hash_ans[hash_temp[j + i - 1] * p[n - j - 1]] = true;
         else
            hash_ans[(hash_temp[j + i - 1] - hash_temp[j - 1]) * p[n - j - 1]] = true;
      }
      answer += hash_ans.size();
   }
   cout << answer;
}

Code2 (uses set, TLE 2):

#include <iostream>
#include <string>
#include <vector>
#include <set>

using namespace std;

int main ()
{
   string s;
   cin >> s;
   vector <long long> p;
   p.push_back(1);
   for (int i = 1; i < s.size(); i++)
      p.push_back(31 * p[i - 1]);
   vector <long long> hash_temp;
   hash_temp.push_back((s[0] - 'a' + 1) * p[0]);
   for (int i = 1; i < s.size(); i++)
      hash_temp.push_back((s[i] - 'a' + 1) * p[i] + hash_temp[i - 1]);   
   int n = s.size();   
   int answer = 0;
   for (int i = 1; i <= n; i++)
   {
      set <long long> hash_ans;
      for (int j = 0; j < n - i + 1; j++)
      {
         if (j == 0)
            hash_ans.insert(hash_temp[j + i - 1] * p[n - j - 1]);
         else
            hash_ans.insert((hash_temp[j + i - 1] - hash_temp[j - 1]) * p[n - j - 1]);
      }
      answer += hash_ans.size();
   }
   cout << answer;
}
like image 577
Armen Tsirunyan Avatar asked Apr 12 '13 12:04

Armen Tsirunyan


2 Answers

The actual differences I see (tell me if I missed anything) are that in the map case you do

hash_ans[key] = true;

while in the set case you do

hash_ans.insert(key);

In both cases, an element is inserted, unless it already exists, in which it does nothing. In both cases, the lookup needs to locate the according element and insert it on failure. In effectively every implementation out there, the containers will use a tree, making the lookup equally expensive. Even more, the C++ standard actually requires set::insert() and map::operator[]() to be O(log n) in complexity, so the complexity of both implementations should be the same.

Now, what could be the reason that one performs better? One difference is that in one case a node of the underlying tree contains a string, while in the other it's a pair<string const, bool>. Since the pair contains a string, it must be larger and put more pressure on the RAM interface of the machine, so this doesn't explain the speedup. What it could do is enlarge the node size so that other nodes are pushed off the cache line, which can be bad for performance in multi-core system.

In summary, there's some things I'd try:

  1. use the same data in the set
    I'd do this with struct data: string {bool b}; i.e. bundle the string in a struct that should have a similar binary layout as the map's elements. As comparator, use less<string>, so that only the string actually takes part in the comparisons.

  2. use insert() on the map
    I don't think this should be an issue, but the insert could incur a copy of the argument, even if no insert takes place in the end. I would hope that it doesn't though, so I'm not too confident this will change anything.

  3. turn off debugging
    Most implementations have a diagnostic mode, where iterators are validated. You can use this to catch errors where C++ only says "undefined behaviour", shrugs its shoulders and crashes on you. This mode often doesn't meet complexity guarantees and it always has some overhead.

  4. read the code
    If the implementations for set and map have different levels of quality and optimization, this could explain the differences. Under the hood, I'd expect both map and set to be built on the same type of tree though, so not much hope here either.

like image 113
Ulrich Eckhardt Avatar answered Oct 23 '22 14:10

Ulrich Eckhardt


A set is only a little bit faster than a map in this case I guess. Still I don't think you should case that much as TLE 2 or TLE 3 is not really a big deal. It may happen if you are clsoe to the time limit that the same solution time limits on test 2 on a given submit and next time it time limits on test 3. I have some solutions passing the tests just on the time limit and I bet if I resubmit them they will fail.

This particular problem I solved using a Ukonen Sufix tree.

like image 20
Ivaylo Strandjev Avatar answered Oct 23 '22 14:10

Ivaylo Strandjev