Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compare two folders for non identical files based on name?

Tags:

c#

linq

I have two folders A and B..Inside A multiple files are there and inside B multiple files are there..I have to check files in A with files in B for non identical files...I tried like this it is giving whole search result...

var filesnotinboth = from f1 in dir1.GetFiles("*", SearchOption.AllDirectories)
                     from f2 in dir2.GetFiles("*",SearchOption.AllDirectories)
                     where f1.Name != f2.Name
                     select f1.Name;

Any suggestion?

like image 524
bala3569 Avatar asked Dec 28 '22 04:12

bala3569


2 Answers

Well, for one thing that approach is very inefficient - it's going to be calling dir2.GetFiles each time you start with a new f1. It's then going to give a match for every f2 which doesn't match the current f1. So even if it's going to match a later f1, it'll still be output. Imagine that dir1 contains A, B and C, and dir2 contains C and D. You'll end up like this:

f1    f2    Result of where?
 A     C    True
 A     D    True
 B     C    True
 B     D    True
 C     C    False
 C     D    True

So the result would be A, A, B, B, C - you'd still have C (which you didn't want) - just not quite as often as A and B.

You want to use set operations, like this:

var dir1Files = dir1.GetFiles("*", SearchOption.AllDirectories)
                    .Select(x => x.Name);

var dir2Files = dir2.GetFiles("*", SearchOption.AllDirectories)
                    .Select(x => x.Name);

var onlyIn1 = dir1Files.Except(dir2Files);

Now that should work, and more efficiently...

EDIT: I've assumed you want files in A but not in B, based on possibly an earlier version of the question. (I'm not sure whether it was edited in the first five minutes. Obviously the current code isn't going to return anything in B but not A.)

If you want the symmetric difference, use HashSet<T>.SymmetricExceptWith:

var inExactlyOneDirectory = new HashSet<string>(dir1Files);
inExactlyOneDirectory.SymmetricExceptWith(dir2Files);

(Note that I dislike the fact that SymmetricExceptWith is a void method which mutates the existing set, instead of returning a new set or just a sequence. Aside from anything else, it means the variable name is only appropriate after the second statement, not the first.)

EDIT: If you need uniqueness by name and size, you really need an anonymous type representing both. Unfortunately, it's then hard to create a HashSet<T> based on it. So you'll want an extension method like this:

public static HashSet<T> ToHashSet<T>(this IEnumerable<T> set)
{
    return new HashSet<T>(set);
}

Then:

var dir1Files = dir1.GetFiles("*", SearchOption.AllDirectories)
                    .Select(x => new { x.Name, x.Length });

var dir2Files = dir2.GetFiles("*", SearchOption.AllDirectories)
                    .Select(x => new { x.Name, x.Length });

var difference = dir1Files.ToHashSet();
difference.SymmetricExceptWith(dir2Files);
like image 113
Jon Skeet Avatar answered Feb 05 '23 05:02

Jon Skeet


Jon Skeet's answer should help you with understanding why your current solution won't work, and is fundamentally inefficient.

As for solving the problem, one option would be to use the HashSet.SymmetricExceptWith method, which "modifies the current HashSet(Of T) object to contain only elements that are present either in that object or in the specified collection, but not both."

// Thanks to Jon Skeet for template
var dir1Files = dir1.GetFiles("*", SearchOption.AllDirectories)
                    .Select(x => x.Name);

var dir2Files = dir2.GetFiles("*", SearchOption.AllDirectories)
                    .Select(x => x.Name);

var filesNotInBoth = new HashSet<string>(dir1Files);

filesNotInBoth.SymmetricExceptWith(dir2Files);
like image 23
Ani Avatar answered Feb 05 '23 07:02

Ani