Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Original file bytes from StreamReader, magic number detection

I'm trying to differentiate between "text files" and "binary" files, as I would effectively like to ignore files with "unreadable" contents.

I have a file that I believe is a GZIP archive. I'm tring to ignore this kind of file by detecting the magic numbers / file signature. If I open the file with the Hex editor plugin in Notepad++ I can see the first three hex codes are 1f 8b 08.

However if I read the file using a StreamReader, I'm not sure how to get to the original bytes..

using (var streamReader = new StreamReader(@"C:\file"))
{
    char[] buffer = new char[10];
    streamReader.Read(buffer, 0, 10);
    var s = new String(buffer);

    byte[] bytes = new byte[6];
    System.Buffer.BlockCopy(s.ToCharArray(), 0, bytes, 0, 6);
    var hex = BitConverter.ToString(bytes);

    var otherhex = BitConverter.ToString(System.Text.Encoding.UTF8.GetBytes(s.ToCharArray()));
}

At the end of the using statement I have the following variable values:

hex: "1F-00-FD-FF-08-00"
otherhex: "1F-EF-BF-BD-08-00-EF-BF-BD-EF-BF-BD-0A-51-02-03"

Neither of which start with the hex values shown in Notepad++.

Is it possible to get the original bytes from the result of reading a file via StreamReader?

like image 539
Tom Hunter Avatar asked Feb 10 '13 12:02

Tom Hunter


3 Answers

Your code tries to change a binary buffer into a string. Strings are Unicode in NET so two bytes are required. The resulting is a bit unpredictable as you can see.

Just use a BinaryReader and its ReadBytes method

using(FileStream fs = new FileStream(@"C:\file", FileMode.Open, FileAccess.Read))
{
    using (var reader = new BinaryReader(fs, new ASCIIEncoding()))
    {
        byte[] buffer = new byte[10];
        buffer = reader.ReadBytes(10);
        if(buffer[0] == 31 && buffer[1] == 139 && buffer[2] == 8)
            // you have a signature match....
    }
}
like image 178
Steve Avatar answered Oct 16 '22 23:10

Steve


Usage (for a pdf file):

Assert.AreEqual("25504446", GetMagicNumbers(filePath, 4));

Method GetMagicNumbers:

private static string GetMagicNumbers(string filepath, int bytesCount)
{
    // https://en.wikipedia.org/wiki/List_of_file_signatures

    byte[] buffer;
    using (var fs = new FileStream(filepath, FileMode.Open, FileAccess.Read))
    using (var reader = new BinaryReader(fs))
        buffer = reader.ReadBytes(bytesCount);

    var hex = BitConverter.ToString(buffer);
    return hex.Replace("-", String.Empty).ToLower();
}
like image 34
hdev Avatar answered Oct 16 '22 22:10

hdev


You can't. StreamReader is made to read text, not binary. Use the Stream directly to read bytes. In your case FileStream.

To guess whether a file is text or binary you could read the first 4K into a byte[] and interpret that.

Btw, you tried to force chars into bytes. This is invalid by principle. I suggest you familiarize yourself with what an Encoding is: it is the only way to convert between chars and bytes in a semantically correct way.

like image 2
usr Avatar answered Oct 16 '22 23:10

usr