Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the most efficient way to erase duplicates and sort a vector?

I need to take a C++ vector with potentially a lot of elements, erase duplicates, and sort it.

I currently have the below code, but it doesn't work.

vec.erase(       std::unique(vec.begin(), vec.end()),       vec.end()); std::sort(vec.begin(), vec.end()); 

How can I correctly do this?

Additionally, is it faster to erase the duplicates first (similar to coded above) or perform the sort first? If I do perform the sort first, is it guaranteed to remain sorted after std::unique is executed?

Or is there another (perhaps more efficient) way to do all this?

like image 679
Kyle Ryan Avatar asked Jun 25 '09 00:06

Kyle Ryan


People also ask

How do I sort and remove duplicates?

In Excel, there are several ways to filter for unique values—or remove duplicate values: To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates.

Does sort Get rid of duplicates?

The sorted function can be used to sort the elements as desired, the frequency can be computed using the count function and removal of duplicates can be handled using the set function.

How do you clear an entire vector?

clear() removes all the elements from a vector container, thus making its size 0. All the elements of the vector are removed using clear() function.


1 Answers

I agree with R. Pate and Todd Gardner; a std::set might be a good idea here. Even if you're stuck using vectors, if you have enough duplicates, you might be better off creating a set to do the dirty work.

Let's compare three approaches:

Just using vector, sort + unique

sort( vec.begin(), vec.end() ); vec.erase( unique( vec.begin(), vec.end() ), vec.end() ); 

Convert to set (manually)

set<int> s; unsigned size = vec.size(); for( unsigned i = 0; i < size; ++i ) s.insert( vec[i] ); vec.assign( s.begin(), s.end() ); 

Convert to set (using a constructor)

set<int> s( vec.begin(), vec.end() ); vec.assign( s.begin(), s.end() ); 

Here's how these perform as the number of duplicates changes:

comparison of vector and set approaches

Summary: when the number of duplicates is large enough, it's actually faster to convert to a set and then dump the data back into a vector.

And for some reason, doing the set conversion manually seems to be faster than using the set constructor -- at least on the toy random data that I used.

like image 166
Nate Kohl Avatar answered Oct 19 '22 09:10

Nate Kohl