So if I have a vector of words like: <pre class="prettyprint"><code>Vec1 = "words", "words", "are", "fun", "fun" </code></pre> resulting list: "fun", "words" I am trying to determine which words are duplicated, and return an alphabetized vector of 1 copy of them. My problem is that I don't even know where to start, the only thing close to it I found was <code>std::unique_copy</code> which doesn't exactly do what I need. And specifically, I am inputting a <code>std::vector<std::string></code> but outputting a <code>std::list<std::string></code>. And if needed, I can use functor. Could someone at least push me in the right direction please? I already tried reading stl documentation,but I am just "brain" blocked right now.

In 3 lines (not counting the vector and list creation nor the superfluous line-breaks in name of readability): <pre class="prettyprint"><code>vector<string> vec{"words", "words", "are", "fun", "fun"}; list<string> output; sort(vec.begin(), vec.end()); set<string> uvec(vec.begin(), vec.end()); set_difference(vec.begin(), vec.end(), uvec.begin(), uvec.end(), back_inserter(output)); </code></pre> <h3>EDIT</h3> Explanation of the solution: <ol> <li> Sorting the vector is needed in order to use <code>set_difference()</code> later. </li> <li> The <code>uvec</code> set will automatically keep elements sorted, and eliminate duplicates. </li> <li> The <code>output</code> list will be populated by the elements of <code>vec - uvec</code>. </li> </ol>

<ol> <li>Make an empty <code>std::unordered_set<std::string></code> </li> <li>Iterator your vector, checking whether each item is a member of the set</li> <li>If it's already in the set, this is a duplicate, so add to your result list</li> <li>Otherwise, add to the set.</li> </ol> Since you want each duplicate only listed once in the results, you can use a hashset (not list) for the results as well.

IMO, Ben Voigt started with a good basic idea, but I would caution against taking his wording too literally. In particular, I dislike the idea of searching for the string in the set, then adding it to your set if it's not present, and adding it to the output if it was present. This basically means every time we encounter a new word, we search our set of existing words twice, once to check whether a word is present, and again to insert it because it wasn't. Most of that searching will be essentially identical -- unless some other thread mutates the structure in the interim (which could give a race condition). Instead, I'd start by trying to add it to the set of words you've seen. That returns a <code>pair<iterator, bool></code>, with the <code>bool</code> set to <code>true</code> if and only if the value was inserted -- i.e., was not previously present. That lets us consolidate the search for an existing string and the insertion of the new string together into a single insert: <pre class="prettyprint"><code>while (input >> word) if (!(existing.insert(word)).second) output.insert(word); </code></pre> This also cleans up the flow enough that it's pretty easy to turn the test into a functor that we can then use with <code>std::remove_copy_if</code> to produce our results quite directly: <pre class="prettyprint"><code>#include <set> #include <iterator> #include <algorithm> #include <string> #include <vector> #include <iostream> class show_copies { std::set<std::string> existing; public: bool operator()(std::string const &in) { return existing.insert(in).second; } }; int main() { std::vector<std::string> words{ "words", "words", "are", "fun", "fun" }; std::set<std::string> result; std::remove_copy_if(words.begin(), words.end(), std::inserter(result, result.end()), show_copies()); for (auto const &s : result) std::cout << s << "\n"; } </code></pre> Depending on whether I cared more about code simplicity or execution speed, I might use an <code>std::vector</code> instead of the <code>set</code> for result, and use <code>std::sort</code> followed by <code>std::unique_copy</code> to produce the final result. In such a case I'd probably also replace the <code>std::set</code> inside of <code>show_copies</code> with an <code>std::unordered_set</code> instead: <pre class="prettyprint"><code>#include <unordered_set> #include <iterator> #include <algorithm> #include <string> #include <vector> #include <iostream> class show_copies { std::unordered_set<std::string> existing; public: bool operator()(std::string const &in) { return existing.insert(in).second; } }; int main() { std::vector<std::string> words{ "words", "words", "are", "fun", "fun" }; std::vector<std::string> intermediate; std::remove_copy_if(words.begin(), words.end(), std::back_inserter(intermediate), show_copies()); std::sort(intermediate.begin(), intermediate.end()); std::unique_copy(intermediate.begin(), intermediate.end(), std::ostream_iterator<std::string>(std::cout, "\n")); } </code></pre> This is marginally more complex (one whole line longer!) but likely to be substantially faster when/if the number of words gets very large. Also note that I'm using <code>std::unique_copy</code> primarily to produce visible output. If you just want the result in a collection, you can use the standard unique/erase idiom to get unique items in <code>intermediate</code>.

how to find duplicates in std::vector<string> and return a list of them?

Tags:

c++

functor

stl

So if I have a vector of words like:

Vec1 = "words", "words", "are", "fun", "fun"

resulting list: "fun", "words"

I am trying to determine which words are duplicated, and return an alphabetized vector of 1 copy of them. My problem is that I don't even know where to start, the only thing close to it I found was std::unique_copy which doesn't exactly do what I need. And specifically, I am inputting a std::vector<std::string> but outputting a std::list<std::string>. And if needed, I can use functor.

Could someone at least push me in the right direction please? I already tried reading stl documentation,but I am just "brain" blocked right now.

627

asked Jul 27 '13 00:07

Marina Golubtsova

3 Answers

In 3 lines (not counting the vector and list creation nor the superfluous line-breaks in name of readability):

vector<string> vec{"words", "words", "are", "fun", "fun"};
list<string> output;

sort(vec.begin(), vec.end());
set<string> uvec(vec.begin(), vec.end());
set_difference(vec.begin(), vec.end(),
               uvec.begin(), uvec.end(),
               back_inserter(output));

EDIT

Explanation of the solution:

Sorting the vector is needed in order to use set_difference() later.
The uvec set will automatically keep elements sorted, and eliminate duplicates.
The output list will be populated by the elements of vec - uvec.

answered Oct 19 '22 19:10

DanielKO

Make an empty std::unordered_set<std::string>
Iterator your vector, checking whether each item is a member of the set
If it's already in the set, this is a duplicate, so add to your result list
Otherwise, add to the set.

Since you want each duplicate only listed once in the results, you can use a hashset (not list) for the results as well.

answered Oct 19 '22 18:10

Ben Voigt

IMO, Ben Voigt started with a good basic idea, but I would caution against taking his wording too literally.

In particular, I dislike the idea of searching for the string in the set, then adding it to your set if it's not present, and adding it to the output if it was present. This basically means every time we encounter a new word, we search our set of existing words twice, once to check whether a word is present, and again to insert it because it wasn't. Most of that searching will be essentially identical -- unless some other thread mutates the structure in the interim (which could give a race condition).

Instead, I'd start by trying to add it to the set of words you've seen. That returns a pair<iterator, bool>, with the bool set to true if and only if the value was inserted -- i.e., was not previously present. That lets us consolidate the search for an existing string and the insertion of the new string together into a single insert:

while (input >> word)
    if (!(existing.insert(word)).second)
        output.insert(word);

This also cleans up the flow enough that it's pretty easy to turn the test into a functor that we can then use with std::remove_copy_if to produce our results quite directly:

#include <set>
#include <iterator>
#include <algorithm>
#include <string>
#include <vector>
#include <iostream>

class show_copies {
    std::set<std::string> existing;
public:
    bool operator()(std::string const &in) {
        return existing.insert(in).second;
    }
};

int main() {
    std::vector<std::string> words{ "words", "words", "are", "fun", "fun" };
    std::set<std::string> result;

    std::remove_copy_if(words.begin(), words.end(),
        std::inserter(result, result.end()), show_copies());

    for (auto const &s : result)
        std::cout << s << "\n";
}

Depending on whether I cared more about code simplicity or execution speed, I might use an std::vector instead of the set for result, and use std::sort followed by std::unique_copy to produce the final result. In such a case I'd probably also replace the std::set inside of show_copies with an std::unordered_set instead:

#include <unordered_set>
#include <iterator>
#include <algorithm>
#include <string>
#include <vector>
#include <iostream>

class show_copies {
    std::unordered_set<std::string> existing;
public:
    bool operator()(std::string const &in) {
        return existing.insert(in).second;
    }
};

int main() {
    std::vector<std::string> words{ "words", "words", "are", "fun", "fun" };
    std::vector<std::string> intermediate;

    std::remove_copy_if(words.begin(), words.end(),
        std::back_inserter(intermediate), show_copies());

    std::sort(intermediate.begin(), intermediate.end());
    std::unique_copy(intermediate.begin(), intermediate.end(),
        std::ostream_iterator<std::string>(std::cout, "\n"));
}

This is marginally more complex (one whole line longer!) but likely to be substantially faster when/if the number of words gets very large. Also note that I'm using std::unique_copy primarily to produce visible output. If you just want the result in a collection, you can use the standard unique/erase idiom to get unique items in intermediate.

answered Oct 19 '22 19:10

Jerry Coffin

Related questions
                            
                                Do stl containers use implicit sharing?
                            
                                C++, ternary operator operand evaluation rules
                            
                                Are virtual functions the only way to achieve Runtime Polymorphism in C++?
                            
                                C++ Error: Type Name is Not Allowed
                            
                                Making the main thread wait till all other Qthread finished
                            
                                How is std::advance implemented to change behavior on iterator type?
                            
                                Const reference as class member
                            
                                Substring of char[] in c++
                            
                                Do I have to mention private methods in the header file of a class?
                            
                                Adding header files to eclipse build path for C++
                            
                                Does the working of sizeof operator different in c andd c++ [duplicate]
                            
                                Converting integer into array of digits [closed]
                            
                                C/C++: What is the difference between a statically-linked library and an object file?
                            
                                What is the meaning of this C++ macro?
                            
                                Why does trivial loop in python run so much slower than the same in C++? And how to optimize that? [duplicate]
                            
                                Function which returns a reference to local object
                            
                                Override Destructor C++
                            
                                C++ memory allocation for an array of pointers
                            
                                Function parameters transferred in registers on 64bit OS?
                            
                                SSE reduction of float vector

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With