Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# FileInfo - Find duplicate Files

Tags:

c#

file

fileinfo

I have a FileInfo array with ~200.000 File Entries. I need to find all files which have the same filename. I need as result from every duplicate file the directory name and filename because I want to rename them afterwards.

What I've tried already:

  • Comparing each Entry with the whole List with 2 For "loops" // Bad Idea, this would need hours or even days ^^
  • Try to use Linq Sorting // Because i not used Linq before i have hardship to write the correct Statement, maybe someone can help me :)
like image 985
The_Holy_One Avatar asked Dec 16 '22 04:12

The_Holy_One


2 Answers

Sounds like this should do it:

var duplicateNames = files.GroupBy(file => file.Name)
                          .Where(group => group.Count() > 1)
                          .Select(group => group.Key);

Now would be a very good time to learn LINQ. It's incredibly useful - time spent learning it (even just LINQ to Objects) will pay itself back really quickly.

EDIT: Okay, if you want the original FileInfo for each group, just drop the select:

var duplicateGroups = files.GroupBy(file => file.Name)
                           .Where(group => group.Count() > 1);

// Replace with what you want to do
foreach (var group in duplicateGroups)
{
     Console.WriteLine("Files with name {0}", group.Key);
     foreach (var file in group)
     {
         Console.WriteLine("  {0}", file.FullName);
     }
}
like image 99
Jon Skeet Avatar answered Jan 01 '23 13:01

Jon Skeet


This should work:

HashSet<string> fileNamesSet = new HashSet<string>();
List<string> duplicates = new List<string>();

foreach(string fileName in fileNames)
{
    if(fileNamesSet.Contains(fileName))
    {
        duplicates.Add(fileName);
    }
    else
    {
        fileNamesSet.Add(fileName);
    }
}

Then duplicates will contain a list of all the duplicate filenames.

Note that since windows file names are case insensitive, you may wish to take this into account by converting all of the filenames to uppercase first using .ToUpperInvariant()

like image 43
sga101 Avatar answered Jan 01 '23 13:01

sga101