I have a std vector with elements like this:
[0 , 1 , 2 , 0 , 2 , 1 , 0 , 0 , 188 , 220 , 0 , 1 , 2 ]
What is the most efficient way to find and copy the elements that occur only once in this vector, excluding the brute force O(n^2) algorithm? In this case the new list should contain [188, 220]
unordered_map<DataType, Count> count;
count[value]++;
count
map copying keys for which value is 1.It's O(n)
. You have hashes so for small data sets normal map might be more efficient, but technically it would be O(n log n)
.
It's a good method for discrete data sets.
Code example:
#include <iostream>
#include <unordered_map>
#include <vector>
#include <algorithm>
using namespace std;
int main() {
vector<int> v{1,1,2,3,3,4};
unordered_map<int,int> count;
for (const auto& e : v) count[e]++;
vector<int> once;
for (const auto& e : count) if(e.second == 1) once.push_back(e.first);
for (const auto& e : once) cout << e << '\n';
return 0;
}
I have tried few ideas. But I don't see a way around map
. unordered_multiset
is almost a great way... except it does not allow you to iterate over keys. It has a method to check for count of key, but you would need another set just for keys to probe. I don't see it as a simpler way. In modern c++ with auto
s counting is easy. I've also looked through algorithm
library, but I haven't found any transfrom
, copy_if
, generate
, etc. which could conditionally transform an element (map entry -> value if count is 1).
There are very few universally-optimal algorithms. Which algorithm works best usually depends upon the properties of the data that's being processed. Removing duplicates is one such example.
Is v
small and filled mostly with unique values?
auto lo = v.begin(), hi = v.end();
std::sort(lo, hi);
while (lo != v.end()) {
hi = std::mismatch(lo + 1, v.end(), lo).first;
lo = (std::distance(lo, hi) == 1) ? hi : v.erase(lo, hi);
}
Is v
small and filled mostly with duplicates?
auto lo = v.begin(), hi = v.end();
std::sort(lo, hi);
while (lo != v.end()) {
hi = std::upper_bound(lo + 1, v.end(), *lo);
lo = (std::distance(lo, hi) == 1) ? hi : v.erase(lo, hi);
}
Is v
gigantic?
std::unordered_map<int, bool> keyUniqueness{};
keyUniqueness.reserve(v.size());
for (int key : v) {
bool wasMissing = keyUniqueness.find(key) == keyUniqueness.end();
keyUniqueness[key] = wasMissing;
}
v.clear();
for (const auto& element : keyUniqueness) {
if (element.second) { v.push_back(element.first); }
}
And so on.
@luk32's answer is definitely the most time efficient way of solving this question. However, if you are short on memory and can't afford an unordered_map
, there are other ways of doing it.
You can use std::sort()
to sort the vector first. Then the non-duplicates can be found in one iteration. Overall complexity being O(nlogn)
.
If the question is slightly different, and you know there is only one non-duplicate element, you can use this code (code in Java). The conplexity here is O(n)
.
Since you use a std::vector
, I presume you want to maximize all its benefits including reference locality. In order to do that, we need a bit of typing here. And I benchmarked the code below...
I have a linear O(n)
algorithm here (effectively O(nlog(n))
), its a bit like brian's answer, but I use OutputIterators instead of doing it in-place. The pre-condition is that it's sorted.
template<typename InputIterator, typename OutputIterator>
OutputIterator single_unique_copy(InputIterator first, InputIterator last, OutputIterator result){
auto previous = first;
if(previous == last || ++first == last) return result;
while(true){
if(*first == *previous)
while((++first != last) && (*first == *previous));
else
*(result++) = *previous;
if(first == last) break;
previous = first;
++first;
}
return ++result;
}
And here is a sample usage:
int main(){
std::vector<int> vm = {0, 1, 2, 0, 2, 1, 0, 0, 1, 88, 220, 0, 1, 2, 227, -8};
std::vector<int> kk;
std::sort(vm.begin(), vm.end());
single_unique_copy(vm.begin(), vm.end(), std::back_inserter(kk));
for(auto x : kk) std::cout << x << ' ';
return 0;
}
As expected, the output is:
-8, 88, 220, 227
Your use case may be different from mine, so, profile first... :-)
EDIT:
i % 5
. 9.34
seconds and mine: 7.80
seconds2.71
seconds and mine 0.52
seconds For smaller numbers, the difference still holds, until it becomes a non-critical code
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With