Most efficient way to compare a memorystream to a file C# .NET

Question

I have a MemoryStream containing the bytes of a PNG-encoded image, and want to check if there is an exact duplicate of that image data in a directory on disk. The first obvious step is to only look for files that match the exact length, but after this I'd like to know what's the most efficient way to compare the memory against the files. I'm not very experienced working with streams.

I had a couple thoughts on the matter:

First, if I could get a hash code for the file, it would (presumably) be more efficient to compare hash codes rather than every byte of the image. Similarly, I could compare just some of the bytes of the image, giving a "close-enough" answer.

And then of course I could just compare the entire stream, but I don't know how quick that would be.

What's the best way to compare a MemoryStream to a file? Byte-by-byte in a for-loop?

CodeMonkey1313 · Accepted Answer

Another solution:

private static bool CompareMemoryStreams(MemoryStream ms1, MemoryStream ms2)
{
    if (ms1.Length != ms2.Length)
        return false;
    ms1.Position = 0;
    ms2.Position = 0;

    var msArray1 = ms1.ToArray();
    var msArray2 = ms2.ToArray();

    return msArray1.SequenceEqual(msArray2);
}

Tomas Petricek · Answer

Firstly, getting hashcode of the two streams won't help - to calculate hashcodes, you'd need to read the entire contents and perform some simple calculation while reading. If you compare the files byte-by-byte or using buffers, then you can stop earlier (after you find first two bytes/blocks) that don't match.

However, this approach would make sense if you needed to compare the MemoryStream against multiple files, because then you'd need to loop through the MemoryStream just once (to calculate the hashcode) and tne loop through all the files.

In any case, you'll have to write code to read the entire file. As you mentioned, this can be done either byte-by-byte or using buffers. Reading data into buffer is a good idea, because it may be more efficient operation when reading from HDD (e.g. reading 1kB buffer). Moreover, you could use asynchronous BeginRead method if you need to process multiple files in parallel.

Summary:

If you need to compare multiple files, use hashcode
To read/compare content of single file:
- Read 1kB of data into a buffer from both streams
- See if there is a difference (if yes, quit)
- Continue looping

Implement the above steps asynchronously using BeginRead if you need to process mutliple files in parallel.

Most efficient way to compare a memorystream to a file C# .NET

Tags:

c#

.net

file

comparison

image

devios1

2 Answers

CodeMonkey1313

Tomas Petricek

Recent Activity

Donate For Us

Most efficient way to compare a memorystream to a file C# .NET

Tags:

c#

.net

file

comparison

image

devios1

2 Answers

CodeMonkey1313

Tomas Petricek

Related questions

Recent Activity

Donate For Us