I'm trying to differentiate between "text files" and "binary" files, as I would effectively like to ignore files with "unreadable" contents.
I have a file that I believe is a GZIP archive. I'm tring to ignore this kind of file by detecting the magic numbers / file signature. If I open the file with the Hex editor plugin in Notepad++ I can see the first three hex codes are 1f 8b 08
.
However if I read the file using a StreamReader
, I'm not sure how to get to the original bytes..
using (var streamReader = new StreamReader(@"C:\file"))
{
char[] buffer = new char[10];
streamReader.Read(buffer, 0, 10);
var s = new String(buffer);
byte[] bytes = new byte[6];
System.Buffer.BlockCopy(s.ToCharArray(), 0, bytes, 0, 6);
var hex = BitConverter.ToString(bytes);
var otherhex = BitConverter.ToString(System.Text.Encoding.UTF8.GetBytes(s.ToCharArray()));
}
At the end of the using statement I have the following variable values:
hex: "1F-00-FD-FF-08-00"
otherhex: "1F-EF-BF-BD-08-00-EF-BF-BD-EF-BF-BD-0A-51-02-03"
Neither of which start with the hex values shown in Notepad++.
Is it possible to get the original bytes from the result of reading a file via StreamReader
?
Your code tries to change a binary buffer into a string. Strings are Unicode in NET so two bytes are required. The resulting is a bit unpredictable as you can see.
Just use a BinaryReader and its ReadBytes method
using(FileStream fs = new FileStream(@"C:\file", FileMode.Open, FileAccess.Read))
{
using (var reader = new BinaryReader(fs, new ASCIIEncoding()))
{
byte[] buffer = new byte[10];
buffer = reader.ReadBytes(10);
if(buffer[0] == 31 && buffer[1] == 139 && buffer[2] == 8)
// you have a signature match....
}
}
Usage (for a pdf file):
Assert.AreEqual("25504446", GetMagicNumbers(filePath, 4));
Method GetMagicNumbers:
private static string GetMagicNumbers(string filepath, int bytesCount)
{
// https://en.wikipedia.org/wiki/List_of_file_signatures
byte[] buffer;
using (var fs = new FileStream(filepath, FileMode.Open, FileAccess.Read))
using (var reader = new BinaryReader(fs))
buffer = reader.ReadBytes(bytesCount);
var hex = BitConverter.ToString(buffer);
return hex.Replace("-", String.Empty).ToLower();
}
You can't. StreamReader
is made to read text, not binary. Use the Stream
directly to read bytes. In your case FileStream
.
To guess whether a file is text or binary you could read the first 4K into a byte[]
and interpret that.
Btw, you tried to force chars into bytes. This is invalid by principle. I suggest you familiarize yourself with what an Encoding
is: it is the only way to convert between chars and bytes in a semantically correct way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With