Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What container to store unique values?

I've got the following problem. I have a game which runs on average 60 frames per second. Each frame I need to store values in a container and there must be no duplicates.

It probably has to store less than 100 items per frame, but the number of insert-calls will be alot more (and many rejected due to it has to be unique). Only at the end of the frame do I need to traverse the container. So about 60 iterations of the container per frame, but alot more insertions.

Keep in mind the items to store are simple integer.

There are a bunch of containers I can use for this but I cannot make up my mind what to pick. Performance is the key issue for this.

Some pros/cons that I've gathered:


vector

  • (PRO): Contigous memory, a huge factor.
  • (PRO): Memory can be reserved first, very few allocations/deallocations afterwards
  • (CON): No alternative than to traverse the container (std::find) each insert() to find unique keys? The comparison is simple though (integers) and the whole container can probably fit the cache

set

  • (PRO): Simple, clearly meant for this
  • (CON): Not constant insert-time
  • (CON): Alot of allocations/deallocations per frame
  • (CON): Not contigous memory. Traversing a set of hundreds of objects means jumping around alot in memory.

unordered_set

  • (PRO): Simple, clearly meant for this
  • (PRO): Average case constant time insert
  • (CON): Seeing as I store integers, hash operation is probably alot more expensive than anything else
  • (CON): Alot of allocations/deallocations per frame
  • (CON): Not contigous memory. Traversing a set of hundreds of objects means jumping around alot in memory.

I'm leaning on going the vector-route because of memory access patterns, even though set is clearly meant for this issue. The big issue that is unclear to me is whether traversing the vector for each insert is more costly than the allocations/deallocations (especially considering how often this must be done) and the memory lookups of set.

I know ultimately it all comes down to profiling each case, but if nothing else than as a headstart or just theoretically, what would probably be best in this scenario? Are there any pros/cons I might've missed aswell?

EDIT: As I didnt mention, the container is cleared() at the end of each frame

like image 262
KaiserJohaan Avatar asked Feb 27 '15 14:02

KaiserJohaan


People also ask

Which datatype helps to store unique values?

Set data structure is used to store unique values only, meaning no duplicate values would be stored in a set. When a HashSet is created, it internally implements a HashMap. An element can be inserted into the HashSet using the 'add' function.

Which of these containers are used for finding unique values efficiently?

Container library as flat_set .

Which object is best suited to store unique values in JavaScript?

Conclusion. The Set object is a great tool to have in your arsenal. Its ability to store unique values, especially when converting from other data types, makes it a great choice for quickly pulling out information that can be bogged down through repetitive noise.

Which STL does not allow duplicates?

So, the conclusion, use std::unordered_set or std::unordered_map (if you need the key-value feature). And you don't need to check before doing the insertion, these are unique-key containers, they don't allow duplicates.


1 Answers

I did timing with a few different methods that I thought were likely candidates. Using std::unordered_set was the winner.

Here are my results:

Using UnorderedSet:    0.078s
Using UnsortedVector:  0.193s
Using OrderedSet:      0.278s
Using SortedVector:    0.282s

Timing is based on the median of five runs for each case.

compiler: gcc version 4.9.1
flags:    -std=c++11 -O2
OS:       ubuntu 4.9.1
CPU:      Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz

Code:

#include <algorithm>
#include <chrono>
#include <cstdlib>
#include <iostream>
#include <random>
#include <set>
#include <unordered_set>
#include <vector>

using std::cerr;
static const size_t n_distinct = 100;

template <typename Engine>
static std::vector<int> randomInts(Engine &engine,size_t n)
{
  auto distribution = std::uniform_int_distribution<int>(0,n_distinct);
  auto generator = [&]{return distribution(engine);};
  auto vec = std::vector<int>();
  std::generate_n(std::back_inserter(vec),n,generator);
  return vec;
}


struct UnsortedVectorSmallSet {
  std::vector<int> values;
  static const char *name() { return "UnsortedVector"; }
  UnsortedVectorSmallSet() { values.reserve(n_distinct); }

  void insert(int new_value)
  {
    auto iter = std::find(values.begin(),values.end(),new_value);
    if (iter!=values.end()) return;
    values.push_back(new_value);
  }
};


struct SortedVectorSmallSet {
  std::vector<int> values;
  static const char *name() { return "SortedVector"; }
  SortedVectorSmallSet() { values.reserve(n_distinct); }

  void insert(int new_value)
  {
    auto iter = std::lower_bound(values.begin(),values.end(),new_value);
    if (iter==values.end()) {
      values.push_back(new_value);
      return;
    }
    if (*iter==new_value) return;
    values.insert(iter,new_value);
  }
};

struct OrderedSetSmallSet {
  std::set<int> values;
  static const char *name() { return "OrderedSet"; }
  void insert(int new_value) { values.insert(new_value); }
};

struct UnorderedSetSmallSet {
  std::unordered_set<int> values;
  static const char *name() { return "UnorderedSet"; }
  void insert(int new_value) { values.insert(new_value); }
};



int main()
{
  //using SmallSet = UnsortedVectorSmallSet;
  //using SmallSet = SortedVectorSmallSet;
  //using SmallSet = OrderedSetSmallSet;
  using SmallSet = UnorderedSetSmallSet;

  auto engine = std::default_random_engine();

  std::vector<int> values_to_insert = randomInts(engine,10000000);
  SmallSet small_set;
  namespace chrono = std::chrono;
  using chrono::system_clock;
  auto start_time = system_clock::now();
  for (auto value : values_to_insert) {
    small_set.insert(value);
  }
  auto end_time = system_clock::now();
  auto& result = small_set.values;

  auto sum = std::accumulate(result.begin(),result.end(),0u);
  auto elapsed_seconds = chrono::duration<float>(end_time-start_time).count();

  cerr << "Using " << SmallSet::name() << ":\n";
  cerr << "  sum=" << sum << "\n";
  cerr << "  elapsed: " << elapsed_seconds << "s\n";
}
like image 172
Vaughn Cato Avatar answered Oct 13 '22 18:10

Vaughn Cato