Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing duplicates in a vector of strings

Tags:

c++

vector

I have a vector of strings:

std::vector<std::string> fName

which holds a list of file names <a,b,c,d,a,e,e,d,b>.

I want to get rid of all the files that have duplicates and want to retain only the files that do not have duplicates in the vector.

for(size_t l = 0; l < fName.size(); l++)
{
    strFile = fName.at(l);
    for(size_t k = 1; k < fName.size(); k++)
    {
        strFile2 = fName.at(k);
        if(strFile.compare(strFile2) == 0)
        {
            fName.erase(fName.begin() + l);
            fName.erase(fName.begin() + k);
        }
    }
}

This is removing a few of the duplicate but still has a few duplicates left, need help in debugging.

Also my input looks like <a,b,c,d,e,e,d,c,a> and my expected output is <b> as all other files b,c,d,e have duplicates they are removed.

like image 805
Deepak B Avatar asked Feb 11 '12 02:02

Deepak B


3 Answers

#include <algorithm>

template <typename T>
void remove_duplicates(std::vector<T>& vec)
{
  std::sort(vec.begin(), vec.end());
  vec.erase(std::unique(vec.begin(), vec.end()), vec.end());
}

Note: this require that T has operator< and operator== defined.

Why it work?

std::sort sort the elements using their less-than comparison operator

std::unique removes the duplicate consecutive elements, comparing them using their equal comparison operator

What if i want only the unique elements?

Then you better use std::map

#include <algorithm>
#include <map>

template <typename T>
void unique_elements(std::vector<T>& vec)
{   
  std::map<T, int> m;
  for(auto p : vec) ++m[p];
  vec.erase(transform_if(m.begin(), m.end(), vec.begin(),
                         [](std::pair<T,int> const& p) {return p.first;},
                         [](std::pair<T,int> const& p) {return p.second==1;}),
            vec.end());
}

See: here.

like image 77
15 revs, 2 users 95% Avatar answered Nov 15 '22 20:11

15 revs, 2 users 95%


If I understand your requirements correctly, and I'm not entirely sure that I do. You want to only keep the elements in your vector of which do not repeat, correct?

Make a map of strings to ints, used for counting occurrences of each string. Clear the vector, then copy back only the strings that only occurred once.

map<string,int> m;
for (auto & i : v)
    m[i]++;
v.clear();
for (auto & i : m)
    if(i.second == 1)
        v.push_back(i.first);

Or, for the compiler-feature challenged:

map<string,int> m;
for (vector<string>::iterator i=v.begin(); i!=v.end(); ++i)
    m[*i]++;
v.clear();
for (map<string,int>::iterator i=m.begin(); i!=m.end(); ++i)
    if (i->second == 1)
        v.push_back(i->first);
like image 36
Benjamin Lindley Avatar answered Nov 15 '22 21:11

Benjamin Lindley


#include <algorithms>

template <typename T>
remove_duplicates(std::vector<T>& vec)
{
  std::vector<T> tvec;
  uint32_t size = vec.size();
  for (uint32_t i; i < size; i++) {
    if (std::find(vec.begin() + i + 1, vec.end(), vec[i]) == vector.end()) {
      tvec.push_back(t);
    } else {
      vec.push_back(t);
    }
  vec = tvec; // : )
  }
}
like image 2
perreal Avatar answered Nov 15 '22 20:11

perreal