Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the difference between two lists

In my current project I'm trying to compare two lists of objects, finding out if objects have been added, removed, changed or stayed the same.

I'm leveraging IEnumerable.Except for this as follows:

Dim newOnes = current.Except(previous, equalityComparer).ToList
Dim removedOnes = previous.Except(current, equalityComparer).ToList()
Dim existingOnes = current.Except(newOnes, equalityComparer).ToList
Dim changedOnes = existingOnes.Except(previous, changedComparer).ToList()
Dim unchangedOnes = existingOnes.Except(changedOnes, equalityComparer).ToList()

For this I have to implement IEqualityComparers.
Finding out if a pair of objects has changed in property values (changedOnes), requires I write a 'changedComparer' which is an IEqualityComparer that checks on non-identity-defining-fields (such as member collections).

As the Except method apparently first checks the GetHashCode and doesn't go to the Equals method if the hashes are equal, my setup falls apart.

I'm currently solving this as follows:

Public Overloads Function GetHashCode(obj As Family) As Integer Implements IEqualityComparer(Of Family).GetHashCode
    Dim hashCode As Long = 17
    If obj.ClientCode IsNot Nothing Then hashCode = CInt(((hashCode * 397) Xor obj.ClientCode.GetHashCode()) Mod Integer.MaxValue)
    ' SNIP a bunch more property fields
    If obj.Members IsNot Nothing Then hashCode = CInt(((hashCode * 397) Xor obj.Members.GetHashCode()) Mod Integer.MaxValue)

    Return CInt(hashCode Mod Integer.MaxValue)
End Function

Adding the hash of the Members list, always returns a different hash as it is checking the instance, not the content. This works for now, but off course removes all the advantages of having a hash.

UPDATE
What I'm looking for is not a better Equals method, but I question my entire methodology (maybe there is something OOTB, should I use a different interface). Failing that, how can I have a good GetHashcode when my property collection should be taken into account?

like image 382
Boris Callens Avatar asked Dec 02 '13 14:12

Boris Callens


2 Answers

I think Enumerable.SequenceEqual should do the trick, as it always uses the default equality comparer without requiring implementation of IEqualityComparer. The caveat though is that it also checks for order comparison.

like image 144
Jon Limjap Avatar answered Nov 15 '22 09:11

Jon Limjap


I use a KeyEqualityComparer<T, TKey> class that allows me to specify a key extracting lambda to detect any new or removed items.

I've implemented a ListDifference (in C#, sorry) utility, that when given a keyExtractor and comparer spits out a list of new, deleted, changed and unchanged items. It does the obvious thing, i.e.

public ListDifference(IEnumerable<T> original, 
                      IEnumerable<T> updated, 
                      Func<T, object> keyExtractor = null, 
                      Func<T, T, bool> comparer = null)
{
  if (keyExtractor == null)
    keyExtractor = t => t;

  AddedItems = updated.Difference(original, keyExtractor);
  RemovedItems = original.Difference(updated, keyExtractor);

  var matches = from u in updated
                from o in original
                where keyExtractor(u).Equals(keyExtractor(o))
                select Tuple.Create(o, u);

  if (comparer == null)
  {
     //we are unable to detect changed items
    ChangedItems = null;
    UnchangedItems = matches;
  }
  else
  {
    ChangedItems = matches.Where(t => !comparer(t.Item1, t.Item2));
    UnchangedItems = matches.Where(t => comparer(t.Item1, t.Item2));
  }
}
like image 20
SWeko Avatar answered Nov 15 '22 09:11

SWeko