Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linq get values not shared across multiple lists

What's the most efficient way to write a method that will compare n lists and return all the values that do not appear in all lists, so that

var lists = new List<List<int>> {
                                  new List<int> { 1, 2, 3, 4 },
                                  new List<int> { 2, 3, 4, 5, 8 },
                                  new List<int> { 2, 3, 4, 5, 9, 9 },
                                  new List<int> { 2, 3, 3, 4, 9, 10 }
                                };


public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
  //...fast algorithm here
}

so that

lists.GetNonShared();

returns 1, 5, 8, 9, 10

I had

public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
  return list.SelectMany(item => item)
             .Except(lists.Aggregate((a, b) => a.Intersect(b));
}

But I wasn't sure if that was efficient. Order does not matter. Thanks!

like image 395
Brad Urani Avatar asked Sep 15 '11 19:09

Brad Urani


2 Answers

        public static IEnumerable<T> GetNonShared<T>(this IEnumerable<IEnumerable<T>> list)
        {
           return list.SelectMany(x => x.Distinct()).GroupBy(x => x).Where(g => g.Count() < list.Count()).Select(group => group.Key);
        }
like image 190
mironych Avatar answered Sep 22 '22 18:09

mironych


EDIT: I think I'd think of it like this...

You want the union of all the lists, minus the intersection of all the lists. That's effectively what your original does, leaving Except to do the "set" operation of Union despite getting duplicate inputs. In this case I suspect you could do this more efficiently just building up two HashSets and doing all the work in-place:

public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{        
    using (var iterator = lists.GetEnumerator())
    {
        if (!iterator.MoveNext())
        {
            return new T[0]; // Empty
        }

        HashSet<T> union = new HashSet<T>(iterator.Current.ToList());
        HashSet<T> intersection = new HashSet<T>(union);
        while (iterator.MoveNext())
        {
            // This avoids iterating over it twice; it may not be necessary,
            // it depends on how you use it.
            List<T> list = iterator.Current.Toist();
            union.UnionWith(list);
            intersection = intersection.IntersectWith(list);
        }
        union.ExceptWith(intersection);
        return union;
    }
}

Note that this is now eager, not deferred.


Here's an alternative option:

public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
    return list.SelectMany(list => list)
               .GroupBy(x => x)
               .Where(group => group.Count() < lists.Count)
               .Select(group => group.Key);
}

If it's possible for a list to contain the same item more than once, you'd want a Distinct call in there:

public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
    return list.SelectMany(list => list.Distinct())
               .GroupBy(x => x)
               .Where(group => group.Count() < list.Count)
               .Select(group => group.Key);
}

EDIT: Now I've corrected this, I understand your original code... and I suspect I can find something better... thinking...

like image 38
Jon Skeet Avatar answered Sep 22 '22 18:09

Jon Skeet