Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep the duplicated values only - Vectors C++

Tags:

c++

vector

Assume I have a vector with the following elements {1, 1, 2, 3, 3, 4} I want to write a program with c++ code to remove the unique values and keep only the duplicated once. So the end result will be something like this {1,3}.

So far this is what I've done, but it takes a lot of time, Is there any way this can be more efficient,

vector <int> g1 = {1,1,2,3,3,4}
vector <int> g2;

for(int i = 0; i < g1.size(); i++)
{
  if(count(g1.begin(), g1.end(), g1[i]) > 1)
    g2.push_back(g1[i]);

}

v.erase(std::unique(g2.begin(), g2.end()), g2.end());

for(int i = 0; i < g2.size(); i++)
{
  cout << g2[i];
}
like image 818
Mohamed Abdel Aziz Avatar asked Dec 14 '22 10:12

Mohamed Abdel Aziz


2 Answers

My approach is to create an <algorithm>-style template, and use an unordered_map to do the counting. This means you only iterate over the input list once, and the time complexity is O(n). It does use O(n) extra memory though, and isn't particularly cache-friendly. Also this does assume that the type in the input is hashable.

#include <algorithm>
#include <iostream>
#include <iterator>
#include <unordered_map>

template <typename InputIt, typename OutputIt>
OutputIt copy_duplicates(
        InputIt  first,
        InputIt  last,
        OutputIt d_first)
{
    std::unordered_map<typename std::iterator_traits<InputIt>::value_type,
                       std::size_t> seen;
    for ( ; first != last; ++first) {
        if ( 2 == ++seen[*first] ) {
            // only output on the second time of seeing a value
            *d_first = *first;
            ++d_first;
        }
    }
    return d_first;
}

int main()
{
    int i[] = {1, 2, 3, 1, 1, 3, 5}; // print 1, 3,
    //                  ^     ^
    copy_duplicates(std::begin(i), std::end(i),
                    std::ostream_iterator<int>(std::cout, ", "));
}

This can output to any kind of iterator. There are special iterators you can use that when written to will insert the value into a container.

like image 126
BoBTFish Avatar answered Jan 08 '23 20:01

BoBTFish


Here's a way that's a little more cache friendly than unordered_map answer, but is O(n log n) instead of O(n), though it does not use any extra memory and does no allocations. Additionally, the overall multiplier is probably higher, in spite of it's cache friendliness.

#include <vector>
#include <algorithm>

void only_distinct_duplicates(::std::vector<int> &v)
{
    ::std::sort(v.begin(), v.end());
    auto output = v.begin();
    auto test = v.begin();
    auto run_start = v.begin();
    auto const end = v.end();
    for (auto test = v.begin(); test != end; ++test) {
       if (*test == *run_start) {
           if ((test - run_start) == 1) {
              *output = *run_start;
              ++output;
           }
       } else {
           run_start = test;
       }
    }
    v.erase(output, end);
}

I've tested this, and it works. If you want a generic version that should work on any type that vector can store:

template <typename T>
void only_distinct_duplicates(::std::vector<T> &v)
{
    ::std::sort(v.begin(), v.end());
    auto output = v.begin();
    auto test = v.begin();
    auto run_start = v.begin();
    auto const end = v.end();
    for (auto test = v.begin(); test != end; ++test) {
       if (*test != *run_start) {
           if ((test - run_start) > 1) {
              ::std::swap(*output, *run_start);
              ++output;
           }
           run_start = test;
       }
    }
    if ((end - run_start) > 1) {
        ::std::swap(*output, *run_start);
        ++output;
    }
    v.erase(output, end);
}
like image 36
Omnifarious Avatar answered Jan 08 '23 18:01

Omnifarious