What's the most efficient way to write a method that will compare n lists and return all the values that do not appear in all lists, so that
var lists = new List<List<int>> {
new List<int> { 1, 2, 3, 4 },
new List<int> { 2, 3, 4, 5, 8 },
new List<int> { 2, 3, 4, 5, 9, 9 },
new List<int> { 2, 3, 3, 4, 9, 10 }
};
public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
//...fast algorithm here
}
so that
lists.GetNonShared();
returns 1, 5, 8, 9, 10
I had
public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
return list.SelectMany(item => item)
.Except(lists.Aggregate((a, b) => a.Intersect(b));
}
But I wasn't sure if that was efficient. Order does not matter. Thanks!
public static IEnumerable<T> GetNonShared<T>(this IEnumerable<IEnumerable<T>> list)
{
return list.SelectMany(x => x.Distinct()).GroupBy(x => x).Where(g => g.Count() < list.Count()).Select(group => group.Key);
}
EDIT: I think I'd think of it like this...
You want the union of all the lists, minus the intersection of all the lists. That's effectively what your original does, leaving Except
to do the "set" operation of Union
despite getting duplicate inputs. In this case I suspect you could do this more efficiently just building up two HashSet
s and doing all the work in-place:
public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
using (var iterator = lists.GetEnumerator())
{
if (!iterator.MoveNext())
{
return new T[0]; // Empty
}
HashSet<T> union = new HashSet<T>(iterator.Current.ToList());
HashSet<T> intersection = new HashSet<T>(union);
while (iterator.MoveNext())
{
// This avoids iterating over it twice; it may not be necessary,
// it depends on how you use it.
List<T> list = iterator.Current.Toist();
union.UnionWith(list);
intersection = intersection.IntersectWith(list);
}
union.ExceptWith(intersection);
return union;
}
}
Note that this is now eager, not deferred.
Here's an alternative option:
public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
return list.SelectMany(list => list)
.GroupBy(x => x)
.Where(group => group.Count() < lists.Count)
.Select(group => group.Key);
}
If it's possible for a list to contain the same item more than once, you'd want a Distinct call in there:
public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
return list.SelectMany(list => list.Distinct())
.GroupBy(x => x)
.Where(group => group.Count() < list.Count)
.Select(group => group.Key);
}
EDIT: Now I've corrected this, I understand your original code... and I suspect I can find something better... thinking...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With