I have a non-performant method, how can I improve its efficiency?

Tags:

I have a simple method to compare an array of FileInfo objects against a list of filenames to check what files have been already been processed. The unprocessed list is then returned.

The loop of this method iterates for about 250,000 FileInfo objects. This is taking an obscene amount of time to compete.

The inefficiency is obviously the Contains method call on the processedFiles collection.

First how can I check to make sure my suspicion is true about the cause and secondly, how can I improve the method to speed the process up?

public static List<FileInfo> GetUnprocessedFiles(FileInfo[] allFiles, List<string> processedFiles)
{
List<FileInfo> unprocessedFiles = new List<FileInfo>();
foreach (FileInfo fileInfo in allFiles)
{
    if (!processedFiles.Contains(fileInfo.Name))
    {
        unprocessedFiles.Add(fileInfo);
    }
    }
    return unprocessedFiles;
}

518

asked Nov 04 '10 12:11

Ant Swift

2 Answers

A List<T>'s Contains method runs in linear time, since it potentially has to enumerate the entire list to prove the existence / non-existence of an item. I would suggest you use aHashSet<string> or similar instead. A HashSet<T>'s Containsmethod is designed to run in constant O(1) time, i.e it shouldn't depend on the number of items in the set.

This small change should make the entire method run in linear time:

public static List<FileInfo> GetUnprocessedFiles(FileInfo[] allFiles, 
                                         List<string> processedFiles)
{
   List<FileInfo> unprocessedFiles = new List<FileInfo>();
   HashSet<string> processedFileSet = new HashSet<string>(processedFiles);

   foreach (FileInfo fileInfo in allFiles)
   {
       if (!processedFileSet.Contains(fileInfo.Name))
       {
           unprocessedFiles.Add(fileInfo);
       }
    }

   return unprocessedFiles;
}

I would suggest 3 improvements, if possible:

For extra efficiency, store the processed files in a set at the source, so that this method takes an ISet<T> as a parameter. This way, you won't have to reconstruct the set every time.
Try not to mix and match different representations of the same entity (string and FileInfo) in this fashion. Pick one and go with it.
You might also want to consider the HashSet<T>.ExceptWith method instead of doing the looping yourself. Bear in mind that this will mutate the collection.

If you can use LINQ, and you can afford to build up a set on every call, here's another way:

public static IEnumerable<string> GetUnprocessedFiles
 (IEnumerable<string> allFiles, IEnumerable<string> processedFiles)
{
  // null-checks here
  return allFiles.Except(processedFiles);     
}

140

answered Dec 12 '22 19:12

Ani

I would try to convert the processedFiles List to a HashSet. With a list, it needs to iterate the list everytime you call contains. A HashSet is an O(1) operation.

answered Dec 12 '22 20:12

Keith Rousseau

Related questions
                            
                                Make close button hide instead of closing [duplicate]
                            
                                How to return the number of a month in C# function
                            
                                Delete the last instance of a certain string from a text file without changing the other instances of the string
                            
                                Please tell me what is use of "default" keyword in c#.net
                            
                                What is the easiest way to loop through a folder of files in C#?
                            
                                Select 5, 10, 15, 20 and so on with LINQ
                            
                                Multiple Inheritance in C#: What is the purist way of achieving what I am trying to do?
                            
                                C# Log4Net - dynamically change log directory programmatically
                            
                                C# - Crop Transparent/White space
                            
                                Disable Alt+F4 in UserControl
                            
                                C# Array contains partial
                            
                                List<IJob>.AddRange(List<Job>) doesn't work
                            
                                DotNetOpenAuth: How to implement a simple OpenId provider?
                            
                                Alternative to IEnumerable<T>.Skip(1).Take(1).Single()
                            
                                Validations in ASP.NET and C#
                            
                                Combined "Check Add or Fetch" from Dictionary
                            
                                Finding minimum values of (properties of ) collections in C#
                            
                                Why bother with initializers? (.net)
                            
                                How to compile Unsafe code in C#
                            
                                Problem launching a System.Diagnostics.Process under Windows 7

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

I have a non-performant method, how can I improve its efficiency?

Tags:

performance

c#

time-complexity

Ant Swift

People also ask

2 Answers

Ani

Keith Rousseau

Recent Activity

Donate For Us