Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does text from Assembly.GetManifestResourceStream() start with three junk characters?

I have a SQL file added to my VS.NET 2008 project as an embedded resource. Whenever I use the following code to read the file's content, the string returned always starts with three junk characters and then the text I expect. I assume this has something to do with the Encoding.Default I am using, but that is just a guess. Why does this text keep showing up? Should I just trim off the first three characters or is there a more informed approach?

public string GetUpdateRestoreSchemaScript()
{
    var type = GetType();
    var a = Assembly.GetAssembly(type);
    var script = "UpdateRestoreSchema.sql";
    var resourceName = String.Concat(type.Namespace, ".", script);
    using(Stream stream = a.GetManifestResourceStream(resourceName))
    {
        byte[] buffer = new byte[stream.Length];
        stream.Read(buffer, 0, buffer.Length);
        // UPDATE: Should be Encoding.UTF8
        return Encoding.Default.GetString(buffer);
    }
}

Update: I now know that my code works as expected if I simply change the last line to return a UTF-8 encoded string. It will always be true for this embedded file, but will it always be true? Is there a way to test any buffer to determine its encoding?

like image 430
flipdoubt Avatar asked Feb 23 '09 18:02

flipdoubt


1 Answers

Probably the file is in utf-8 encoding and Encoding.Default is ASCII. Why don't you use a specific encoding?

Edit to answer a comment:

In order to guess the file encoding you could look for BOM at the start of the stream. If it exists, it helps, if not then you can only guess or ask user.

like image 96
Alex Reitbort Avatar answered Oct 22 '22 02:10

Alex Reitbort