Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do these two files hash to the same value when I use MemoryStream?

I'm writing a c# routine that creates hashes from jpg files. If I pass in a byte array to my SHA512 object then I get the expected behavior, however, if I pass in a memory stream the two files always hash to the same value.

Example 1:

        SHA512 mySHA512 = SHA512.Create();

        Image img1 = Image.FromFile(@"d:\img1.jpg");
        Image img2 = Image.FromFile(@"d:\img2.jpg");
        MemoryStream ms1 = new MemoryStream();
        MemoryStream ms2 = new MemoryStream();

        img1.Save(ms1, ImageFormat.Jpeg);
        byte[] buf1 = ms1.GetBuffer();
        byte[] hash1 = mySHA512.ComputeHash(buf1);

        img2.Save(ms2, ImageFormat.Jpeg);
        byte[] buf2 = ms2.GetBuffer();
        byte[] hash2 = mySHA512.ComputeHash(buf2);

        if (Convert.ToBase64String(hash1) == Convert.ToBase64String(hash2))
            MessageBox.Show("Hashed the same");
        else
            MessageBox.Show("Different hashes");

That produces "Different hashes". But one of the overloads of the ComputeHash method takes a stream object in and I'd rather use that. When I do:

        SHA512 mySHA512 = SHA512.Create();

        Image img1 = Image.FromFile(@"d:\img1.jpg");
        Image img2 = Image.FromFile(@"d:\img2.jpg");
        MemoryStream ms1 = new MemoryStream();
        MemoryStream ms2 = new MemoryStream();

        img1.Save(ms1, ImageFormat.Jpeg);
        byte[] hash1 = mySHA512.ComputeHash(ms1);

        img2.Save(ms2, ImageFormat.Jpeg);
        byte[] hash2 = mySHA512.ComputeHash(ms2);

        if (Convert.ToBase64String(hash1) == Convert.ToBase64String(hash2))
            MessageBox.Show("Hashed the same");
        else
            MessageBox.Show("Different hashes");

That produces "Hashed the same".

What's going on here that I'm missing?

like image 655
Lee Warner Avatar asked Nov 11 '09 14:11

Lee Warner


People also ask

How does MemoryStream work?

MemoryStream encapsulates data stored as an unsigned byte array. The encapsulated data is directly accessible in memory. Memory streams can reduce the need for temporary buffers and files in an application. The current position of a stream is the position at which the next read or write operation takes place.

What is the difference between MemoryStream and FileStream?

As the name suggests, a FileStream reads and writes to a file whereas a MemoryStream reads and writes to the memory. So it relates to where the stream is stored.

How do I reuse MemoryStream?

You can re-use the MemoryStream by Setting the Position to 0 and the Length to 0. By setting the length to 0 you do not clear the existing buffer, it only resets the internal counters.


1 Answers

You're not rewinding your MemoryStreams, so the hash is computed from an empty sequence of bytes. Use

ms1.Position = 0;
ms2.Position = 0;

after calling Save.

One further note: don't use GetBuffer in this way. Use ToArray which will give you a byte array the same size as the stream's length - GetBuffer returns the raw buffer which will (usually) have some padding, which you wouldn't want to use accidentally. You can use GetBuffer if you then make sure you only use the relevant portion of it, of course - this avoids creating a new copy of the data.

like image 180
Jon Skeet Avatar answered Nov 12 '22 23:11

Jon Skeet