Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Function timed out for large list (LINQ query in C#)

I am using the following query

var queryList1Only = (from file in list1
                                  select file).Except(list2, myFileCompare);

while myFileCompare does a comparison of 2 files based on the name and length.

The query was returning the results if the list1 and list2 were small (say 100 files while I tested), then I increased the list1 to 30,000 files and list2 to 20,000 files and the query now says "Function Evaluation Timed Out".

I searched online and found debugging could cause it, so I removed all the breakpoints and ran the code, now the program just froze, without any output for queryList1Only I am trying to print out to check it.

EDIT: This is the code for myFileCompare

class FileCompare : System.Collections.Generic.IEqualityComparer<System.IO.FileInfo>
    {

        public FileCompare() { }

        public bool Equals(System.IO.FileInfo f1, System.IO.FileInfo f2)
        {
            return (f1.Name == f2.Name && f1.Directory.Name == f2.Directory.Name && 
                    f1.Length == f2.Length);
        }

        // Return a hash that reflects the comparison criteria. According to the 
        // rules for IEqualityComparer<T>, if Equals is true, then the hash codes must
        // also be equal. Because equality as defined here is a simple value equality, not
        // reference identity, it is possible that two or more objects will produce the same
        // hash code.
        public int GetHashCode(System.IO.FileInfo fi)
        {
            string s = String.Format("{0}{1}", fi.Name, fi.Length);
            return s.GetHashCode();
        }

    }
like image 282
superstar Avatar asked Aug 17 '11 19:08

superstar


2 Answers

What are you need to do with the items returned by a query? Basically such heavy operations would be great to execute simultaneously in a separate thread to avoid the situations you've just faced.

EDIT: An idea

As a case you can try following algorithm:

  • Sort items in both arrays using QuickSort (List<T>.Sort() uses it by default), it will be pretty fast with good implementation of GetHashCode()
  • Then in well known for() loop traverse list and compare elements with the same index
  • When count of any array reaches maximum index of an other list - select all items from latter list as different (basically they are not exists in former list at all).

I believe with sorted arrays you'll give much better performance. I believe complexity of Except() is O(m*n).

EDIT: An other idea, should be really fast

  • From one server store items in Set<T>
  • Then loop through second array and search within a Set<T>, it would be VERY fast! Basically O(mlogm) + O(n) because you need to traverse only single array and search within a set with good hash function (use GetHashCode() I've provided with an updated logic) is very quick. Try it out!
// some kind of C# pseudocode ;)
public IEnumerable<FileInfo> GetDifference()
{           
    ISet<FileInfo> firstServerFilesMap = new HashSet<FileInfo>();

    // adding items to set
    firstServerFilesMap.Add();

    List<FileInfo> secondServerFiles = new List<FileInfo>();

    // adding items to list
    firstServerFilesMap.Add();

    foreach (var secondServerFile in secondServerFiles)
    {
        if (!firstServerFilesMap.Contains(secondServerFile))
        {
            yield return secondServerFile;
        }
    }
}

EDIT: More details regarding equality logic were provided in comments

Try out this impelmentation

public bool Equals(System.IO.FileInfo f1, System.IO.FileInfo f2)
{
      if ( f1 == null || f2 == null)
      {
          return false;
      }

      return (f1.Name == f2.Name && f1.Directory.Name == f2.Directory.Name && 
             f1.Length == f2.Length);
}

public int GetHashCode(System.IO.FileInfo fi)
{
    unchecked
    {
        int hash = 17;    
        hash = hash * 23 + fi.Name.GetHashCode();
        hash = hash * 23 + fi.Directory.Name.GetHashCode();
        hash = hash * 23 + fi.Length.GetHashCode();

        return hash;
    }
}

Useful links:

  • GetHashCode Guidelines in C#
  • What is the best algorithm for an overridden System.Object.GetHashCode?
like image 107
sll Avatar answered Nov 13 '22 20:11

sll


I haven't tried this myself, but here is an idea: Implement list1 as HashSet, this way:

HashSet<FileInfo> List1 = new HashSet<FileInfo>(myFileCompare);

Add all files:

foreach(var file in files)
{
    List1.Add(file);
}

Then remove elements:

List1.ExceptWith(list2);

Then enumerate:

foreach(var file in List1)
{
    //do something
}

I think it's faster, but as I said, I haven't tried it. Here is a link with general information on HashSet.

Edit: Or better yet, you can initialize and add data in one step:

HashSet<FileInfo> List1 = new HashSet<FileInfo>(files, myFileCompare);
like image 26
Francisco Avatar answered Nov 13 '22 22:11

Francisco