Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distinct list of lists, where lists contains same values but in different order

Tags:

c#

list

distinct

I got a list:

var list = new List<List<int>>();

which could contain

list[0] = {1, 2, 3, 4}
list[1] = {3, 1, 2, 4}
list[2] = {2, 1, 7, 3}

How can I detect the duplicate between [0] and [1] and remove one of them? Code is c-sharp.

In reality it's not a int, but that shouldn't change the question.

like image 771
Jim Carragher Avatar asked Dec 23 '10 10:12

Jim Carragher


2 Answers

You could write your own implementation of IEqualityComparer<List<int>>. For GetHashCode() it would simply return the XOR of all the hash codes of the elements in the list. For Equals() it would create a new HashSet<int> from the first list, and call HashSet<T>.SetEquals on it, passing in the second list. This assumes there will be no duplicate elements, mind you. (Otherwise { 1, 1, 2 } will be equal to { 1, 2, 2 } but have a different hash code.)

Once you've got that far, you can use Distinct:

var distinct = list.Distinct(new CustomEqualityComparer());

As an alternative approach, could you use HashSet<T> as your collection type to start with? Then it's really easy:

var distinct = sets.Distinct(HashSet<int>.CreateSetComparer());

If you need lists as the input but can cope with sets as the output:

var distinct = list.Select(x => new HashSet<int>(x))
                   .Distinct(HashSet<int>.CreateSetComparer());
like image 166
Jon Skeet Avatar answered Sep 23 '22 16:09

Jon Skeet


Here's the euqality comparer Jon Skeet is talking about (his advice regarding working with HashSets to begin with is also spot on, of course):

    public class EnumerableComparer<T> : IEqualityComparer<IEnumerable<T>> 
                                          where T : IComparable<T>
    {
        public bool Equals(IEnumerable<T> first, IEnumerable<T> second)
        {
            if (first == second)
                return true;
            if ((first == null) || (second == null))
                return false;

            return new HashSet<T>(first).SetEquals(second);
        }

        public int GetHashCode(IEnumerable<T> enumerable)
        {
            return enumerable.OrderBy(x => x)
              .Aggregate(17, (current, val) => current*23 + val.GetHashCode());
        }
    }

So you'd do something like:

list.Distinct(new EnumerableComparer());

If the elements are not guaranteed to be unique - Use the IEqualityComparer I posted here: Comparing two collections for equality irrespective of the order of items in them

(In previous edits, I mistakingly posted an IEqulityComparer that compares between two lists of lists - could be very useful when dealing with partitions, but that's a different topic)

like image 34
Ohad Schneider Avatar answered Sep 22 '22 16:09

Ohad Schneider