Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I have a non-performant method, how can I improve its efficiency?

I have a simple method to compare an array of FileInfo objects against a list of filenames to check what files have been already been processed. The unprocessed list is then returned.

The loop of this method iterates for about 250,000 FileInfo objects. This is taking an obscene amount of time to compete.

The inefficiency is obviously the Contains method call on the processedFiles collection.

First how can I check to make sure my suspicion is true about the cause and secondly, how can I improve the method to speed the process up?

public static List<FileInfo> GetUnprocessedFiles(FileInfo[] allFiles, List<string> processedFiles)
{
List<FileInfo> unprocessedFiles = new List<FileInfo>();
foreach (FileInfo fileInfo in allFiles)
{
    if (!processedFiles.Contains(fileInfo.Name))
    {
        unprocessedFiles.Add(fileInfo);
    }
    }
    return unprocessedFiles;
}
like image 518
Ant Swift Avatar asked Nov 04 '10 12:11

Ant Swift


People also ask

How can efficiency be improved?

You can improve work efficiency by setting compelling goals, learning how to manage your time and developing thoughtful habits. To learn more about achieving professional success and further mastering work efficiency, attend Business Mastery.

What action could be taken to improve your performance?

Spend your time wisely on tasks that align with goals. Organize your notes, inbox and workspaces for increased focus, motivation and time management. Take breaks throughout the day to avoid burnout. Identify motivators such as tasks, goals or colleagues.


2 Answers

A List<T>'s Contains method runs in linear time, since it potentially has to enumerate the entire list to prove the existence / non-existence of an item. I would suggest you use aHashSet<string> or similar instead. A HashSet<T>'s Containsmethod is designed to run in constant O(1) time, i.e it shouldn't depend on the number of items in the set.

This small change should make the entire method run in linear time:

public static List<FileInfo> GetUnprocessedFiles(FileInfo[] allFiles, 
                                         List<string> processedFiles)
{
   List<FileInfo> unprocessedFiles = new List<FileInfo>();
   HashSet<string> processedFileSet = new HashSet<string>(processedFiles);

   foreach (FileInfo fileInfo in allFiles)
   {
       if (!processedFileSet.Contains(fileInfo.Name))
       {
           unprocessedFiles.Add(fileInfo);
       }
    }

   return unprocessedFiles;
}

I would suggest 3 improvements, if possible:

  1. For extra efficiency, store the processed files in a set at the source, so that this method takes an ISet<T> as a parameter. This way, you won't have to reconstruct the set every time.
  2. Try not to mix and match different representations of the same entity (string and FileInfo) in this fashion. Pick one and go with it.
  3. You might also want to consider the HashSet<T>.ExceptWith method instead of doing the looping yourself. Bear in mind that this will mutate the collection.

If you can use LINQ, and you can afford to build up a set on every call, here's another way:

public static IEnumerable<string> GetUnprocessedFiles
 (IEnumerable<string> allFiles, IEnumerable<string> processedFiles)
{
  // null-checks here
  return allFiles.Except(processedFiles);     
}
like image 140
Ani Avatar answered Dec 12 '22 19:12

Ani


I would try to convert the processedFiles List to a HashSet. With a list, it needs to iterate the list everytime you call contains. A HashSet is an O(1) operation.

like image 24
Keith Rousseau Avatar answered Dec 12 '22 20:12

Keith Rousseau