I am using the following query
var queryList1Only = (from file in list1
select file).Except(list2, myFileCompare);
while myFileCompare
does a comparison of 2 files based on the name and length.
The query was returning the results if the list1 and list2 were small (say 100 files while I tested), then I increased the list1 to 30,000 files and list2 to 20,000 files and the query now says "Function Evaluation Timed Out"
.
I searched online and found debugging could cause it, so I removed all the breakpoints and ran the code, now the program just froze, without any output for queryList1Only
I am trying to print out to check it.
EDIT: This is the code for myFileCompare
class FileCompare : System.Collections.Generic.IEqualityComparer<System.IO.FileInfo>
{
public FileCompare() { }
public bool Equals(System.IO.FileInfo f1, System.IO.FileInfo f2)
{
return (f1.Name == f2.Name && f1.Directory.Name == f2.Directory.Name &&
f1.Length == f2.Length);
}
// Return a hash that reflects the comparison criteria. According to the
// rules for IEqualityComparer<T>, if Equals is true, then the hash codes must
// also be equal. Because equality as defined here is a simple value equality, not
// reference identity, it is possible that two or more objects will produce the same
// hash code.
public int GetHashCode(System.IO.FileInfo fi)
{
string s = String.Format("{0}{1}", fi.Name, fi.Length);
return s.GetHashCode();
}
}
What are you need to do with the items returned by a query? Basically such heavy operations would be great to execute simultaneously in a separate thread to avoid the situations you've just faced.
EDIT: An idea
As a case you can try following algorithm:
QuickSort
(List<T>.Sort()
uses it by default), it will be pretty fast with good implementation of GetHashCode()
for()
loop traverse list and compare elements with the same indexI believe with sorted arrays you'll give much better performance. I believe complexity of Except() is O(m*n).
EDIT: An other idea, should be really fast
Set<T>
Set<T>
, it would be VERY fast! Basically O(mlogm) + O(n) because you need to traverse only single array and search within a set with good hash function (use GetHashCode()
I've provided with an updated logic) is very quick. Try it out!// some kind of C# pseudocode ;)
public IEnumerable<FileInfo> GetDifference()
{
ISet<FileInfo> firstServerFilesMap = new HashSet<FileInfo>();
// adding items to set
firstServerFilesMap.Add();
List<FileInfo> secondServerFiles = new List<FileInfo>();
// adding items to list
firstServerFilesMap.Add();
foreach (var secondServerFile in secondServerFiles)
{
if (!firstServerFilesMap.Contains(secondServerFile))
{
yield return secondServerFile;
}
}
}
EDIT: More details regarding equality logic were provided in comments
Try out this impelmentation
public bool Equals(System.IO.FileInfo f1, System.IO.FileInfo f2)
{
if ( f1 == null || f2 == null)
{
return false;
}
return (f1.Name == f2.Name && f1.Directory.Name == f2.Directory.Name &&
f1.Length == f2.Length);
}
public int GetHashCode(System.IO.FileInfo fi)
{
unchecked
{
int hash = 17;
hash = hash * 23 + fi.Name.GetHashCode();
hash = hash * 23 + fi.Directory.Name.GetHashCode();
hash = hash * 23 + fi.Length.GetHashCode();
return hash;
}
}
Useful links:
I haven't tried this myself, but here is an idea:
Implement list1
as HashSet, this way:
HashSet<FileInfo> List1 = new HashSet<FileInfo>(myFileCompare);
Add all files:
foreach(var file in files)
{
List1.Add(file);
}
Then remove elements:
List1.ExceptWith(list2);
Then enumerate:
foreach(var file in List1)
{
//do something
}
I think it's faster, but as I said, I haven't tried it. Here is a link with general information on HashSet.
Edit: Or better yet, you can initialize and add data in one step:
HashSet<FileInfo> List1 = new HashSet<FileInfo>(files, myFileCompare);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With