So if I have a vector of words like:
Vec1 = "words", "words", "are", "fun", "fun"
resulting list: "fun", "words"
I am trying to determine which words are duplicated, and return an alphabetized vector of 1 copy of them. My problem is that I don't even know where to start, the only thing close to it I found was std::unique_copy
which doesn't exactly do what I need. And specifically, I am inputting a std::vector<std::string>
but outputting a std::list<std::string>
. And if needed, I can use functor.
Could someone at least push me in the right direction please? I already tried reading stl documentation,but I am just "brain" blocked right now.
Using std::set_difference To find duplicates present in a vector, we can find the set difference between the original elements and the distinct elements. The following code example demonstrates this using the standard algorithm std::set_difference .
Get the stream of elements in which the duplicates are to be found. For each element in the stream, count the frequency of each element, using Collections. frequency() method. Then for each element in the collection list, if the frequency of any element is more than one, then this element is a duplicate element.
Algorithm. Define a string and take the string as input form the user. Two loops will be used to find the duplicate characters. Outer loop will be used to select a character and then initialize variable count by 1 its inside the outer loop so that the count is updated to 1 for every new character.
In 3 lines (not counting the vector and list creation nor the superfluous line-breaks in name of readability):
vector<string> vec{"words", "words", "are", "fun", "fun"};
list<string> output;
sort(vec.begin(), vec.end());
set<string> uvec(vec.begin(), vec.end());
set_difference(vec.begin(), vec.end(),
uvec.begin(), uvec.end(),
back_inserter(output));
Explanation of the solution:
Sorting the vector is needed in order to use set_difference()
later.
The uvec
set will automatically keep elements sorted, and eliminate duplicates.
The output
list will be populated by the elements of vec - uvec
.
std::unordered_set<std::string>
Since you want each duplicate only listed once in the results, you can use a hashset (not list) for the results as well.
IMO, Ben Voigt started with a good basic idea, but I would caution against taking his wording too literally.
In particular, I dislike the idea of searching for the string in the set, then adding it to your set if it's not present, and adding it to the output if it was present. This basically means every time we encounter a new word, we search our set of existing words twice, once to check whether a word is present, and again to insert it because it wasn't. Most of that searching will be essentially identical -- unless some other thread mutates the structure in the interim (which could give a race condition).
Instead, I'd start by trying to add it to the set of words you've seen. That returns a pair<iterator, bool>
, with the bool
set to true
if and only if the value was inserted -- i.e., was not previously present. That lets us consolidate the search for an existing string and the insertion of the new string together into a single insert:
while (input >> word)
if (!(existing.insert(word)).second)
output.insert(word);
This also cleans up the flow enough that it's pretty easy to turn the test into a functor that we can then use with std::remove_copy_if
to produce our results quite directly:
#include <set>
#include <iterator>
#include <algorithm>
#include <string>
#include <vector>
#include <iostream>
class show_copies {
std::set<std::string> existing;
public:
bool operator()(std::string const &in) {
return existing.insert(in).second;
}
};
int main() {
std::vector<std::string> words{ "words", "words", "are", "fun", "fun" };
std::set<std::string> result;
std::remove_copy_if(words.begin(), words.end(),
std::inserter(result, result.end()), show_copies());
for (auto const &s : result)
std::cout << s << "\n";
}
Depending on whether I cared more about code simplicity or execution speed, I might use an std::vector
instead of the set
for result, and use std::sort
followed by std::unique_copy
to produce the final result. In such a case I'd probably also replace the std::set
inside of show_copies
with an std::unordered_set
instead:
#include <unordered_set>
#include <iterator>
#include <algorithm>
#include <string>
#include <vector>
#include <iostream>
class show_copies {
std::unordered_set<std::string> existing;
public:
bool operator()(std::string const &in) {
return existing.insert(in).second;
}
};
int main() {
std::vector<std::string> words{ "words", "words", "are", "fun", "fun" };
std::vector<std::string> intermediate;
std::remove_copy_if(words.begin(), words.end(),
std::back_inserter(intermediate), show_copies());
std::sort(intermediate.begin(), intermediate.end());
std::unique_copy(intermediate.begin(), intermediate.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
This is marginally more complex (one whole line longer!) but likely to be substantially faster when/if the number of words gets very large. Also note that I'm using std::unique_copy
primarily to produce visible output. If you just want the result in a collection, you can use the standard unique/erase idiom to get unique items in intermediate
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With