Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest way to see if an array has two common elements?

Suppose that we have a very long array, of, say, int to make the problem simpler.

What is the fastest way (or just a fast way, if it's not the fastest), in C++ to see if an array has more than one common elements in C++?

To clarify, this function should return this:

[2, 5, 4, 3] => false
[2, 8, 2, 5, 7, 3, 4] => true
[8, 8, 5] => true
[1, 2, 3, 4, 1, 7, 1, 1, 7, 1, 2, 2, 3, 4] => true
[9, 1, 12] => false

One strategy is to loop through the array and for each array element loop through the array again to check. However, this can be very costly and expensive (literally O(n^2)). Is there any better way?

like image 288
new QOpenGLWidget Avatar asked Sep 05 '21 20:09

new QOpenGLWidget


People also ask

How do you check if an array contains any element of another array?

Javascript array contains another array To check if the array contains an array in Javascript, use array some(), and array includes() function. The array some() method checks each element against a test method and returns true if any array item passes the test function. Otherwise, it returns false.

How do you check if two arrays matches with each other?

The Arrays. equals() method checks the equality of the two arrays in terms of size, data, and order of elements. This method will accept the two arrays which need to be compared, and it returns the boolean result true if both the arrays are equal and false if the arrays are not equal.

How can I check if two arrays contain any common item in PHP?

The array_intersect() function compares the values of two (or more) arrays, and returns the matches. This function compares the values of two or more arrays, and return an array that contains the entries from array1 that are present in array2, array3, etc.


3 Answers

(Update Below) Insert the array elements to a std::unordered_set and if the insertion fails, it means you have duplicates.

Something like as follows:

#include <iostream>
#include <vector>
#include <unordered_set>

bool has_duplicates(const std::vector<int>& vec)
{
    std::unordered_set<int> set;
    for (int ele : vec)
        if (const auto [iter, inserted] = set.emplace(ele); !inserted)
            return true; // has duplicates!
    return false;
}

int main()
{
    std::vector<int> vec1{ 1, 2, 3 };
    std::cout << std::boolalpha << has_duplicates(vec1) << '\n'; // false

    std::vector<int> vec2{ 12, 3, 2, 3 };
    std::cout << std::boolalpha << has_duplicates(vec2) << '\n'; // true
}

Update: As discussed in the comments, this can or may not be the fastest solution. In OP's case, as explained in Marcus Müller's answer, anO(N·log(N)) method would be better, which we can achieve by having a sorted array check for dupes.

Here is a quick benchmark that I made for the two cases "UnorderedSetInsertion" and the "ArraySort". Following are the result for GCC 10.3, C++20, O3:

enter image description here

like image 125
JeJo Avatar answered Nov 15 '22 08:11

JeJo


This is nearly just a sorting problem, just that you can abort the sorting once you've hit a single equality and return true.

So, if you're memory-limited (That's often the case, not actually time-limited), an in-place sorting algorithm that aborts when it encounters to identical elements will do; so, std::sort with a comparator function that raises an exception when it encounters equality. Complexity would be O(N·log(N)), but let's be honest here: the fact that this is probably less indirect in memory addressing then the creation of a tree-like bucket structure might help. In that sense, I can only recommend you actually compare this to JeJos solution – that looks pretty reasonable, too!

The thing here is that there's very likely not a one-size-fits-all solution: what is fastest will depend on the amount of integers we're talking about. Even quadratic complexity might be better than any of our "clever" answers if that keeps memory access nice and linear – I'm almost certain your speed here is not bounded by your CPU, but by the amount of data you need to shuffle to and from RAM.

like image 39
Marcus Müller Avatar answered Nov 15 '22 08:11

Marcus Müller


How about binning data (or create a histogram), and check for mode of the resultant data. A mode > 1 indicates a repeat value.

like image 23
justAstudent Avatar answered Nov 15 '22 07:11

justAstudent