Suppose that we have a very long array, of, say, <code>int</code> to make the problem simpler. What is the fastest way (or just a fast way, if it's not the fastest), in C++ to see if an array has more than one common elements in C++? To clarify, this function should return this: <pre class="prettyprint"><code>[2, 5, 4, 3] => false [2, 8, 2, 5, 7, 3, 4] => true [8, 8, 5] => true [1, 2, 3, 4, 1, 7, 1, 1, 7, 1, 2, 2, 3, 4] => true [9, 1, 12] => false </code></pre> One strategy is to loop through the array and for each array element loop through the array again to check. However, this can be very costly and expensive (literally <code>O(n^2)</code>). Is there any better way?

(✠Update Below) Insert the array elements to a <code>std::unordered_set</code> and if the insertion fails, it means you have duplicates. Something like as follows: <pre class="prettyprint"><code>#include <iostream> #include <vector> #include <unordered_set> bool has_duplicates(const std::vector<int>& vec) { std::unordered_set<int> set; for (int ele : vec) if (const auto [iter, inserted] = set.emplace(ele); !inserted) return true; // has duplicates! return false; } int main() { std::vector<int> vec1{ 1, 2, 3 }; std::cout << std::boolalpha << has_duplicates(vec1) << '\n'; // false std::vector<int> vec2{ 12, 3, 2, 3 }; std::cout << std::boolalpha << has_duplicates(vec2) << '\n'; // true } </code></pre> <hr> ✠Update: As discussed in the comments, this can or may not be the fastest solution. In OP's case, as explained in Marcus Müller's answer, an<code>O(N·log(N))</code> method would be better, which we can achieve by having a sorted array check for dupes. Here is a quick benchmark that I made for the two cases "UnorderedSetInsertion" and the "ArraySort". Following are the result for GCC 10.3, C++20, O3: <img src="https://i.stack.imgur.com/r8NHq.png" alt="enter image description here">

What is the fastest way to see if an array has two common elements?

Tags:

c++

arrays

algorithm

duplicates

c++17

Suppose that we have a very long array, of, say, int to make the problem simpler.

What is the fastest way (or just a fast way, if it's not the fastest), in C++ to see if an array has more than one common elements in C++?

To clarify, this function should return this:

[2, 5, 4, 3] => false
[2, 8, 2, 5, 7, 3, 4] => true
[8, 8, 5] => true
[1, 2, 3, 4, 1, 7, 1, 1, 7, 1, 2, 2, 3, 4] => true
[9, 1, 12] => false

One strategy is to loop through the array and for each array element loop through the array again to check. However, this can be very costly and expensive (literally O(n^2)). Is there any better way?

288

asked Sep 05 '21 20:09

new QOpenGLWidget

3 Answers

(^✠Update Below) Insert the array elements to a std::unordered_set and if the insertion fails, it means you have duplicates.

Something like as follows:

#include <iostream>
#include <vector>
#include <unordered_set>

bool has_duplicates(const std::vector<int>& vec)
{
    std::unordered_set<int> set;
    for (int ele : vec)
        if (const auto [iter, inserted] = set.emplace(ele); !inserted)
            return true; // has duplicates!
    return false;
}

int main()
{
    std::vector<int> vec1{ 1, 2, 3 };
    std::cout << std::boolalpha << has_duplicates(vec1) << '\n'; // false

    std::vector<int> vec2{ 12, 3, 2, 3 };
    std::cout << std::boolalpha << has_duplicates(vec2) << '\n'; // true
}

^✠Update: As discussed in the comments, this can or may not be the fastest solution. In OP's case, as explained in Marcus Müller's answer, anO(N·log(N)) method would be better, which we can achieve by having a sorted array check for dupes.

Here is a quick benchmark that I made for the two cases "UnorderedSetInsertion" and the "ArraySort". Following are the result for GCC 10.3, C++20, O3:

enter image description here

125

answered Nov 15 '22 08:11

JeJo

This is nearly just a sorting problem, just that you can abort the sorting once you've hit a single equality and return true.

So, if you're memory-limited (That's often the case, not actually time-limited), an in-place sorting algorithm that aborts when it encounters to identical elements will do; so, std::sort with a comparator function that raises an exception when it encounters equality. Complexity would be O(N·log(N)), but let's be honest here: the fact that this is probably less indirect in memory addressing then the creation of a tree-like bucket structure might help. In that sense, I can only recommend you actually compare this to JeJos solution – that looks pretty reasonable, too!

The thing here is that there's very likely not a one-size-fits-all solution: what is fastest will depend on the amount of integers we're talking about. Even quadratic complexity might be better than any of our "clever" answers if that keeps memory access nice and linear – I'm almost certain your speed here is not bounded by your CPU, but by the amount of data you need to shuffle to and from RAM.

answered Nov 15 '22 08:11

Marcus Müller

How about binning data (or create a histogram), and check for mode of the resultant data. A mode > 1 indicates a repeat value.

answered Nov 15 '22 07:11

justAstudent

Related questions
                            
                                How to parse file containing hexadecimals in the form -0x1.0c7474fp+8 in c++?
                            
                                Understanding C++ std::shared_ptr
                            
                                inline variable is initialized more than once
                            
                                Why can't I use std::function as a std::set or std::unordered_set value type?
                            
                                Initialize a class with an array
                            
                                Iinitializing a constexpr std::array of pairs
                            
                                What is a memory location?
                            
                                Creating a lookup table at compile time
                            
                                C++ compile-time substring
                            
                                Is it safe to traverse a container during std::remove_if execution?
                            
                                Why the output of `printf("%llu\n", 1ull << n);` and `printf("%llu\n", 1ull << 64);` is different in C++? (n=64) [duplicate]
                            
                                When declaring a pointer, VS automatically moves the asterisk to the pointer's type
                            
                                Calling functions with the same name in a list of namespaces
                            
                                c++ std::enable_if .... else?
                            
                                error: non-const static data member must be initialized out of line
                            
                                Why in C++11 or C++14 does the compiler implicitly delete the copy constructor when I declare a move assignment operator?
                            
                                Is there an std::variant that holds all variants
                            
                                default value of a unique_ptr
                            
                                How to in-place-construct an optional aggregate?
                            
                                What is the difference between iterator_category and iterator_concept in C++20?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With