Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Items Common to Most Lists

Tags:

c#

list

Given a list of lists (let's say 5 lists, to have a real number with which to work), I can find items that are common to all 5 lists with relative ease (see Intersection of multiple lists with IEnumerable.Intersect()) using a variation of the following code:

var list1 = new List<int>() { 1, 2, 3 };
var list2 = new List<int>() { 2, 3, 4 };
var list3 = new List<int>() { 3, 4, 5 };
var listOfLists = new List<List<int>>() { list1, list2, list3 };
var intersection = listOfLists.Aggregate((previousList, nextList) => previousList.Intersect(nextList).ToList());

Now let's say that intersection ends up containing 0 items. It's quite possible that there are some objects that are common to 4/5 lists. How would I go about finding them in the most efficient way?

I know I could just run through all the combinations of 4 lists and save all the results, but that method doesn't scale very well (this will eventually have to be done on approx. 40 lists).

If no item is common to 4 lists, then the search would be repeated looking for items common to 3/5 lists, etc. Visually, this could be represented by lists of grid points and we're searching for the points that have the most overlap.

Any ideas?

EDIT: Maybe it would be better to look at each point and keep track of how many times it appears in each list, then create a list of the points with the highest occurrence?

like image 625
Kyle G. Avatar asked Mar 23 '23 03:03

Kyle G.


2 Answers

You can select all numbers (points) from all lists, and group them by value. Then sort result by group size (i.e. lists count where point present) and select most common item:

var mostCommon = listOfLists.SelectMany(l => l)
                            .GroupBy(i => i)
                            .OrderByDescending(g => g.Count())
                            .Select(g => g.Key)
                            .First();
// outputs 3

Instead of taking only first item, you can take several top items by replacing First() with Take(N).


Returning items with number of lists (ordered by number of lists):

var mostCommonItems = from l in listOfLists
                      from i in l
                      group i by i into g
                      orderby g.Count() descending
                      select new {
                         Item = g.Key,
                         NumberOfLists = g.Count()
                      };

Usage (item is a strongly-typed anonymous object):

var topItem = mostCommonItems.First();
var item = topItem.Item;
var listsCount = topItem.NumberOfLists;

foreach(var item in mostCommonItems.Take(3))
    // iterate over top three items
like image 67
Sergey Berezovskiy Avatar answered Mar 31 '23 20:03

Sergey Berezovskiy


You can first combine all the lists, then find the Mode of the list using a dictionary strategy as follows. This makes it pretty fast:

/// <summary>
/// Gets the element that occurs most frequently in the collection.
/// </summary>
/// <param name="list"></param>
/// <returns>Returns the element that occurs most frequently in the collection.
/// If all elements occur an equal number of times, a random element in
/// the collection will be returned.</returns>
public static T Mode<T>(this IEnumerable<T> list) 
{
    // Initialize the return value
    T mode = default(T);

    // Test for a null reference and an empty list
    if (list != null && list.Count() > 0)
    {
        // Store the number of occurences for each element
        Dictionary<T, int> counts = new Dictionary<T, int>();

        // Add one to the count for the occurence of a character
        foreach (T element in list)
        {
            if (counts.ContainsKey(element))
                counts[element]++;
            else
                counts.Add(element, 1);
        }

        // Loop through the counts of each element and find the 
        // element that occurred most often
        int max = 0;

        foreach (KeyValuePair<T, int> count in counts)
        {
            if (count.Value > max)
            {
                // Update the mode
                mode = count.Key;
                max = count.Value;
            }
        }
    }

    return mode;
}
like image 36
Ted Avatar answered Mar 31 '23 21:03

Ted