Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# - Compare Two Text Files

Background

I'm developing a simple windows service which monitors certain directories for file creation events and logs these - long story short, to ascertain if a file was copied from directory A to directory B. If a file is not in directory B after X time, an alert will be raised.

The issue with this is I only have the file to go on for information when working out if it has made its way to directory B - I'd assume two files with the same name are the same, but as there are over 60 directory A's and a single directory B - AND the files in any directory A may accidentally be the same as another (by date or sequence) this is not a safe assumption...

Example

Lets say, for example, I store a log that file "E17999_XXX_2111.txt" was created in directory C:\Test. I would store the filename, file path, file creation date, file length and the BOM for this file.

30 seconds later, I detect that the file "E17999_XXX_2111.txt" was created in directory C:\FinalDestination... now I have the task of determining whether;

a) the file is the same one created in C:\Test, therefore I can update the first log as complete and stop worrying about it.

b) the file is not the same and I somehow missed the previous steps - therefore I can ignore this file because it has found its way to the destination dir.

Research

So, in order to determine if the file created in the destination is exactly the same as the one created in the first instance, I've done a bit of research and found the following options:

a) filename compare

b) length compare

c) a creation-date compare

d) byte-for-byte compare

e) hash compare

Problems

a) As I said above, going by Filename alone is too presumptuous.

b) Again, just because the length of the contents of a file is the same, it doesn't necessarily mean the files are actually the same.

c) The problem with this is that a copied file is technically a new file, therefore the creation date changes. I would want to set the first log as complete regardless of the time elapsed between the file appearing in directory A and directory B.

d) Aside from the fact that this method is extremely slow, it appears there's an issue if the second file has somehow changed encoding - for example between ANSII and ASCII, which would cause a byte mis-match for things like ascii quotes

I would like not to assume that just because an ASCII ' has changed to an ANSII ', the file is now different as it is near enough the same.

e) This seems to have the same downfalls as a byte-for-byte compare

EDIT

It appears the actual issue I'm experiencing comes down to the reason for the difference in encoding between directories - I'm not currently able to access the code which deals with this part, so I can't tell why this happens, but I am looking to implement a solution which can compare files regardless of encoding to determine "real" differences (i.e. not those whereby a byte has changed due to encoding)

SOLUTION

I've managed to resolve this now by using the SequenceEqual comparison below after encoding my files to remove any bad data if the initial comparison suggested by @Magnus failed to find a match due to this. Code below:

byte[] bytes1 = Encoding.Convert(Encoding.GetEncoding(1252), Encoding.ASCII, Encoding.GetEncoding(1252).GetBytes(File.ReadAllText(FilePath))); 
byte[] bytes2 = Encoding.Convert(Encoding.GetEncoding(1252), Encoding.ASCII, Encoding.GetEncoding(1252).GetBytes(File.ReadAllText(FilePath))); 

if (Encoding.ASCII.GetChars(bytes1).SequenceEqual(Encoding.ASCII.GetChars(bytes2)))
    { 
    //matched! 
    } 

Thanks for the help!

like image 424
Danny Lager Avatar asked Oct 20 '15 18:10

Danny Lager


1 Answers

You would then have to compare the string content if the files. The StreamReader (which ReadLines uses) should detect the encoding.

var areEquals = System.IO.File.ReadLines("c:\\file1.txt").SequenceEqual(
                System.IO.File.ReadLines("c:\\file2.txt"));

Note that ReadLines will not read the complete file into memory.

like image 168
Magnus Avatar answered Nov 07 '22 07:11

Magnus