Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C/C++ - Efficient way to compare two lists and find missing elements

I have two lists, L1 and L2, of data containing multiple elements, each unique, of an abstract data type (ie: structs). Each of the two lists:

  • May contain between zero and one-hundred (inclusive) elements.
  • Contains no duplicate elements (each element is unique).
  • May or may not contain elements in the other list (ie: L1 and L2 might be identical, or contain completely different elements).
  • Is not sorted.
  • At the lowest level, is stored withing a std::vector<myStruct> container.

What I am typically expecting is that periodically, a new element is added to L2, or an element is subtracted/removed from it. I am trying to detect the differences in the two lists as efficiently (ie: with minimal comparisons) as possible:

  • If an entry is not present in L2 and is present in L1, carry out one operation: Handle_Missing_Element().
  • If an entry is present in L2 and not present in L1, carry out another operation: Handle_New_Element().

Once the above checks are carried out, L1 is set to be equal to L2, and at some time in the future, L2 is checked again.

How could I go about finding out the differences between the two lists? There are two approaches I can think of:

  1. Compare both lists via every possible combination of elements. Possibly O(n2) execution complexity (horrible).

bool found;
for i in 1 .. L2->length()
  found = false;
  for j in 1 .. L1->length()
    if (L1[j] == L2[i]
      // Found duplicate entry
      found = true;
    fi
  endfor
endfor
  1. Sort the lists, and compare the two lists element-wise until I find a difference. This seems like it would be in near-linear time. The problem is that I would need the lists to be sorted. It would be impractical to manually sort the underlying vector after each addition/removal for the list. It would only be reasonable to do this if it were somehow possible to force vector::push_back() to automatically insert elements such that insertions preseve the sorting of the list.

Is there a straightforward way to accomplish this efficiently in C++? I've found similar such problems, but I need to do more than just find the intersection of two sets, or do such a test with just a set of integers, where sum-related tricks can be used, as I need to carry out different operations for "new" vs "missing" elements.

Thank you.

like image 482
Cloud Avatar asked Jul 28 '15 03:07

Cloud


1 Answers

It would be impractical to manually sort the underlying vector after each addition/removal for the list. It would only be reasonable to do this if it were somehow possible to force vector::push_back() to automatically insert elements such that insertions preseve the sorting of the list.

What you're talking about here is an ordered insert. There are functions in <algorithm> that allow you do do this. Rather than using std::vector::push_back you would use std::vector::insert, and call std::lower_bound which does a binary search for the first element not less than than a given value.

auto insert_pos = std::lower_bound( L2.begin(), L2.end(), value );
if( insert_pos == L2.end() || *insert_pos != value )
{
    L2.insert( insert_pos, value );
}

This makes every insertion O(logN) but if you are doing fewer than N insertions between your periodic checks, it ought to be an improvement.

The zipping operation might look something like this:

auto it1 = L1.begin();
auto it2 = L2.begin();

while( it1 != L1.end() && it2 != L2.end() )
{
    if( *it1 < *it2 ) {
        Handle_Missing( *it1++ );
    } else if( *it2 < *it1 ) {
        Handle_New( *it2++ );
    } else {
        it1++;
        it2++;
    }
}

while( it1 != L1.end() ) Handle_Missing( *it1++ );
while( it2 != L2.end() ) Handle_New( *it2++ );
like image 167
paddy Avatar answered Oct 14 '22 05:10

paddy