Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determining if an unordered vector<T> has all unique elements

Profiling my cpu-bound code has suggested I that spend a long time checking to see if a container contains completely unique elements. Assuming that I have some large container of unsorted elements (with < and = defined), I have two ideas on how this might be done:

The first using a set:

template <class T> bool is_unique(vector<T> X) {   set<T> Y(X.begin(), X.end());   return X.size() == Y.size(); } 

The second looping over the elements:

template <class T> bool is_unique2(vector<T> X) {   typename vector<T>::iterator i,j;   for(i=X.begin();i!=X.end();++i) {     for(j=i+1;j!=X.end();++j) {       if(*i == *j) return 0;     }   }   return 1; } 

I've tested them the best I can, and from what I can gather from reading the documentation about STL, the answer is (as usual), it depends. I think that in the first case, if all the elements are unique it is very quick, but if there is a large degeneracy the operation seems to take O(N^2) time. For the nested iterator approach the opposite seems to be true, it is lighting fast if X[0]==X[1] but takes (understandably) O(N^2) time if all the elements are unique.

Is there a better way to do this, perhaps a STL algorithm built for this very purpose? If not, are there any suggestions eek out a bit more efficiency?

like image 871
Hooked Avatar asked May 04 '10 21:05

Hooked


1 Answers

Your first example should be O(N log N) as set takes log N time for each insertion. I don't think a faster O is possible.

The second example is obviously O(N^2). The coefficient and memory usage are low, so it might be faster (or even the fastest) in some cases.

It depends what T is, but for generic performance, I'd recommend sorting a vector of pointers to the objects.

template< class T > bool dereference_less( T const *l, T const *r )  { return *l < *r; }   template <class T> bool is_unique(vector<T> const &x) {     vector< T const * > vp;     vp.reserve( x.size() );     for ( size_t i = 0; i < x.size(); ++ i ) vp.push_back( &x[i] );     sort( vp.begin(), vp.end(), ptr_fun( &dereference_less<T> ) ); // O(N log N)     return adjacent_find( vp.begin(), vp.end(),            not2( ptr_fun( &dereference_less<T> ) ) ) // "opposite functor"         == vp.end(); // if no adjacent pair (vp_n,vp_n+1) has *vp_n < *vp_n+1 } 

or in STL style,

template <class I> bool is_unique(I first, I last) {     typedef typename iterator_traits<I>::value_type T;     … 

And if you can reorder the original vector, of course,

template <class T> bool is_unique(vector<T> &x) {     sort( x.begin(), x.end() ); // O(N log N)     return adjacent_find( x.begin(), x.end() ) == x.end(); } 
like image 130
Potatoswatter Avatar answered Oct 04 '22 19:10

Potatoswatter