Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which data structure should i use for my purpose? [duplicate]

I need a data structure just like a map but each key may have multiple values related to it but I need to get all the values corresponding to a single key as an array of the objects. So which data structure would be the best to do this. I don't need to search in the data structure, i just need fast access to all values corresponding to a particular key. I have looked up at std::multimap but it doesn't return all the values for a particular key. So which might the best data structure in C++ which I may use?

like image 691
ab_11 Avatar asked Jun 07 '13 06:06

ab_11


2 Answers

I need a data structure just like a map but...

std::map<key, std::vector<value>>

80 million points is a good whack - worth considering other options. Ones worth a little thought/experimentation/benchmarking include:

  • sparse direct indexing... to achieve that, you need enough memory for not just the 80 million data points, but the entire x/y/z space that they span, but can then do an [x][y][z] lookup to find the vector of cell ids - that's obviously going to be huge - whether it's doable or desirable is not clear from your problem description

  • a sorted vector... depending on the order/overlap of your data structure element insertion and lookups, and whether you can afford a std::map to std::vector compaction step - you could sort a std::vector of (x,y,z) values then have binary_search outperform std::map due to the contiguous memory usage of the vector

  • std::unordered_map<key, std::vector<value>>... presizing for say 100 million bucket capacity should speed insertion a bit. This could be slower or faster than other options... there's likely less pages of memory for the index than for sparse indexing, but more than a binary_search on contiguous memory, the least # memory pages visited per lookup, but with normal hash techiniques you'll be hitting effectively random (but repeatable) hash buckets even if the x,y,z coordinates only differ a little so cache hits could be worse than all other options above.

Actual benchmarks are always the best way to tune, preferably with a profile to confirm costs are for expected reasons.

like image 80
Tony Delroy Avatar answered Oct 07 '22 12:10

Tony Delroy


The answer by @TonyD is certainly fine, but there are some tradeoffs compared to

std::multimap<key, value> 

Searching for all values for a given key should give you the same O(log N) complexity

auto result = my_multimap.equal_range(my_key);

Iteration is still of O(N) complexity:

for (auto it = result.first; it != result.second; ++it)
     // bla

However, in all real world std::multimap implementations, the above iteration is doing node-based pointer chasing over the "consecutive" value elements rather than the contiguous iteration that you get for a std::vector based std::map. This might matter for reasons of cache-locality.

The main drawback that I can see from the std::vector solution is that you are committing to keeping all values together which might impose some overhead, depending on how often you are copying your data around.

The multimap approach makes it easier to also insert/extract a single value from the container

my_multimap.insert(std::make_pair(some_key, another_value);

versus

auto it = my_map.find(some_key);
if (it != my_map.end()) 
    it->second.push_back(another_value);
else
    my_map.insert(std::make_pair(some_key, another_value));

You probably should benchmark your program to see which container is more convenient.

like image 20
TemplateRex Avatar answered Oct 07 '22 12:10

TemplateRex