I have a bunch of data full of duplicates and I want to eliminate the duplicates. You know, e.g. [1, 1, 3, 5, 5, 5, 7] becomes [1, 3, 5, 7].
It looks like I can use either std::map or std::set to handle this. However I'm not sure whether it's faster to (a) simply insert all the values into the container, or (b) check whether they already exist in the container and only insert if they don't - are inserts very efficient? Even if there's a better way... can you suggest a fast way to do this?
Another question - if the data I'm storing in them isn't as trivial as integers, and instead is a custom class, how does the std::map manage to properly store (hash?) the data for fast access via operator[]?
std::map
doesn't use hashing. std::unordered_map
does, but that's C++11. std::map
and std::set
both use a comparator that you provide. The class templates have defaults for this comparator, which boils down to an operator<
comparison, but you can provide your own.
If you don't need both a key and a value to be stored (looks like you don't) you should just use a std::set
, as that's more appropriate.
The Standard doesn't say what data structures map
s and set
s use under the hood, only that certian actions have certain time complexities. In reality, most implementations I'm aware of use a tree.
It makes no difference time-complexity-wise if you use operator[]
or insert
, but I would use insert
or operator[]
before I did a search
followed by an insert
if the item isn't found. The later would imply two seperate searches to insert an item in to the set.
An insert()
on any of the associated containers does a find()
to see if the object exists and then inserts the object. Simply inserting the elements into an std::set<T>
should get rid of the duplicates reasonably efficiently.
Depending on the size of your set and the ratio of duplicates to unique values, it may be faster to put the objects into std::vector<T>
, std::sort()
then, and then use std::unique()
together with std::vector<T>::erase()
to get rid of the duplicates.
How many times should you do it?
If insert is usual:
//*/
std::set<int> store;
/*/
// for hash:
std::unordered_set<int> store;
//*/
int number;
if ( store.insert(number).second )
{
// was not in store
}
If you fill once:
std::vector<int> store;
int number;
store.push_back(number);
std::sort(store.begin(),store.end());
store.erase(std::unique(store.begin(),store.end()),store.end() );
// elements are unique
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With