Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Suggest a suitable algorithm for merging two arrays containing class objects (without duplication)

I have an array where each position contains a class object with three int values (x,y,z). now from an different array all the elements have to be copied into source array. For each array element we need to check for x,y,z values in order to avoid duplication. Is it possible to do more efficiently than o(n^2) ?

like image 903
rwik Avatar asked Jan 16 '23 08:01

rwik


1 Answers

Provided that you don't mind losing the original order of the two arrays:

std::sort(first_array, first_array + N);
std::sort(second_array, second_array + M);
std::set_union(
    first_array, first_array+N, 
    second_array, second_array+M, 
    target_array
);

N and M are the numbers of elements in the arrays. You need to either define operator< or specialize std::less for your class: alternatively write a comparator function and supply it to sort and set_union.

Time complexity is O(N log N + M log M) -- the sort is the slower part and then set_union is linear.

If first_array or second_array might already contain dupes within themselves (not just between them), then you need an extra step to remove them, which loses not just the order but also the dupes in the source arrays:

std::sort(first_array, first_array + N);
MyClass *first_end = std::unique(first_array, first_array + N);
std::sort(second_array, second_array + M);
MyClass *second_end = std::unique(second_array, second_array + M);
std::set_union(
    first_array, first_end, 
    second_array, second_end, 
    target_array
);

Alternatively you could write a modified version of set_union that merges and dedupes in a single pass.

[Edit: sorry, in writing this I missed that the result is eventually going back into first_array, not into a separate target_array. set_union doesn't work with the output as one of the inputs, so this also requires extra memory for the target array, which can then be copied back to the source array assuming of course that the source is big enough.]

If you do want to preserve the order of the original arrays, then you can create a container and check as you go:

container<MyClass> items(first_array, first_array + N);
MyClass *dst = first_array + N;
for (MyClass *it = second_array; it != second_array + M; ++it) {
    if (items.count(*it) == 0) {
        items.insert(*it);
        *dst++ = *it;
    }
}

If the arrays can contain dupes within themselves then start with items empty and dst = first_array, then loop over both input arrays.

container could be std::set (in which case time is O(N log N + M log(N + M)), which in fact is O(N log N + M log M) again, and you still need an order comparator), or else std::unordered_set in C++11 (in which case expected time is O(N + M) with pathological worst-cases, and you need to specialize std::hash or otherwise write a hash function and also provide an equals function, instead of an order comparator). Prior to C++11, other hash containers are available just not in the standard.

If you don't mind the extra memory and don't mind losing the original order:

container<MyClass> items(first_array, first_array + N);
items.insert(second_array, second_array + M);
std::copy(items.begin(), items.end(), first_array);

If you don't want to use (much) extra memory and have space in the source array for M additional elements, as opposed to merely having space for the result:

std::copy(second_array, second_array + M, first_array + N);
std::sort(first_array, first_array + N + M);
MyClass *dst = std::unique(first_array, first_array + N + M);
// result now has (dst - first_array) elements
like image 84
Steve Jessop Avatar answered Jan 22 '23 19:01

Steve Jessop