Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a linked hash set in C++?

Tags:

c++

set

Java has a LinkedHashSet, which is a set with a predictable iteration order. What is the closest available data structure in C++?

Currently I'm duplicating my data by using both a set and a vector. I insert my data into the set. If the data inserted successfully (meaning data was not already present in the set), then I push_back into the vector. When I iterate through the data, I use the vector.

like image 491
Victor Lyuboslavsky Avatar asked Apr 03 '13 23:04

Victor Lyuboslavsky


2 Answers

If you can use it, then a Boost.MultiIndex with sequenced and hashed_unique indexes is the same data structure as LinkedHashSet.

Failing that, keep an unordered_set (or hash_set, if that's what your implementation provides) of some type with a list node in it, and handle the sequential order yourself using that list node.

The problems with what you're currently doing (set and vector) are:

  • Two copies of the data (might be a problem when the data type is large, and it means that your two different iterations return references to different objects, albeit with the same values. This would be a problem if someone wrote some code that compared the addresses of the "same" elements obtained in the two different ways, expecting the addresses to be equal, or if your objects have mutable data members that are ignored by the order comparison, and someone writes code that expects to mutate via lookup and see changes when iterating in sequence).
  • Unlike LinkedHashSet, there is no fast way to remove an element in the middle of the sequence. And if you want to remove by value rather than by position, then you have to search the vector for the value to remove.
  • set has different performance characteristics from a hash set.

If you don't care about any of those things, then what you have is probably fine. If duplication is the only problem then you could consider keeping a vector of pointers to the elements in the set, instead of a vector of duplicates.

like image 151
Steve Jessop Avatar answered Nov 13 '22 13:11

Steve Jessop


To replicate LinkedHashSet from Java in C++, I think you will need two vanilla std::map (please note that you will get LinkedTreeSet rather than the real LinkedHashSet instead which will get O(log n) for insert and delete) for this to work.

  • One uses actual value as key and insertion order (usually int or long int) as value.
  • Another ones is the reverse, uses insertion order as key and actual value as value.

When you are going to insert, you use std::map::find in the first std::map to make sure that there is no identical object exists in it.

  • If there is already exists, ignore the new one.
  • If it does not, you map this object with the incremented insertion order to both std::map I mentioned before.

When you are going to iterate through this by order of insertion, you iterate through the second std::map since it will be sorted by insertion order (anything that falls into the std::map or std::set will be sorted automatically).

When you are going to remove an element from it, you use std::map::find to get the order of insertion. Using this order of insertion to remove the element from the second std::map and remove the object from the first one.

Please note that this solution is not perfect, if you are planning to use this on the long-term basis, you will need to "compact" the insertion order after a certain number of removals since you will eventually run out of insertion order (2^32 indexes for unsigned int or 2^64 indexes for unsigned long long int). In order to do this, you will need to put all the "value" objects into a vector, clear all values from both maps and then re-insert values from vector back into both maps. This procedure takes O(nlogn) time.

If you're using C++11, you can replace the first std::map with std::unordered_map to improve efficiency, you won't be able to replace the second one with it though. The reason is that std::unordered map uses a hash code for indexing so that the index cannot be reliably sorted in this situation.

like image 31
Truthseeker Rangwan Avatar answered Nov 13 '22 13:11

Truthseeker Rangwan