Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# fast Except method for sorted list

I am working on an application whose bottleneck is list1.Except(list2). From this post : should i use Except or Contains when dealing with HashSet or so in Linq the complexity of except is O(m+n) (m and n standing for the sizes of the lists). However, my lists are sorted. Can this help ?

The first implementation I can think of :

foreach element in list2 (m operations)
    look for it in list1 (ln(n) operations)
    if present 
        set it to null (O(1), removing has a O(n))
    else continue

It has complexity O(m*ln(n)), which is very interesting when m is small and n is large (this is exactly the case with my data sets : m is around 50, n is around 1 000 000). However the fact that it produces null may have a lot of implications on the functions using it... Is there any way to keep this complexity, without having to write nulls (and then keeping track of them)

Any help would be greatly appreciated!

like image 517
RUser4512 Avatar asked Sep 27 '22 04:09

RUser4512


2 Answers

using System;
using System.Collections.Generic;

public class Test
{
    public static void Main()
    {
        var listM = new List<int>();
        var listN = new List<int>();
        for(int i = 0, x = 0; x < 50; i+=13, x++) { 
            listM.Add(i);
        }
        for(int i = 0, x = 0; x < 10000; i+=7, x++) { 
            listN.Add(i);
        }
        Console.WriteLine(SortedExcept(listM, listN).Count);
    }

    public static List<T> SortedExcept<T>(List<T> m, List<T> n) {
        var result = new List<T>();
        foreach(var itm in m) {
            var index = n.BinarySearch(itm);
            if(index < 0) { 
                result.Add(itm); 
            }
        }
        return result;
    }
}

EDIT Here's the O(M + N) version as well

public static List<T> SortedExcept2<T>(List<T> m, List<T> n) where T : IComparable<T> {
    var result = new List<T>();
    int i = 0, j = 0;
    if(n.Count == 0) {
        result.AddRange(m);
        return result;
    }
    while(i < m.Count) {
        if(m[i].CompareTo(n[j]) < 0) {
            result.Add(m[i]);
            i++;
        } else if(m[i].CompareTo(n[j]) > 0) {
            j++;
        } else {
            i++;
        }
        if(j >= n.Count) {
            for(; i < m.Count; i++) { 
                result.Add(m[i]); 
            }
            break;
        }
    }
    return result;
}

In a quick & dirty benchmark http://ideone.com/Y2oEQD M + N is always faster even when N is 10million. BinarySearch suffers penalties because it's accessing array memory in a non-linear fashion; this causes cache misses which slows down the algorithm, so the larger N gets the more memory access penalties with BinarySearch.

like image 80
Louis Ricci Avatar answered Oct 13 '22 13:10

Louis Ricci


If both lists are sorted, then you can implement your own solution easily:

listA except listB algorithm then works as follows:

1. Start from the beginning of both lists
2. If listA element is smaller than the listB element,
   then include the listA element in the output and advance listA
3. If listB element is smaller than the listA element, advance listB
4. If listA and listB elements are equal,
   advance both lists and do not push the element to the output

Repeat until listA is exhausted. Take special care that listB might be exhausted before listA.

like image 29
Zoran Horvat Avatar answered Oct 13 '22 13:10

Zoran Horvat