Yes is a most frequent question, and this matter is vague for me and since I don't know much about it.
But i would like a very precise way to find a files Encoding. So precise as Notepad++ is.
Open up your file using regular old vanilla Notepad that comes with Windows. It will show you the encoding of the file when you click "Save As...". Whatever the default-selected encoding is, that is what your current encoding is for the file.
The evaluated encoding of the open file will display on the bottom bar, far right side. The encodings supported can be seen by going to Settings -> Preferences -> New Document/Default Directory and looking in the drop down.
Open the file in Notepad. Click 'Save As...'. In the 'Encoding:' combo box you will see the current file format. Yes, I opened the file in notepad and selected the UTF-8 format and saved it.
The StreamReader.CurrentEncoding
property rarely returns the correct text file encoding for me. I've had greater success determining a file's endianness, by analyzing its byte order mark (BOM). If the file does not have a BOM, this cannot determine the file's encoding.
*UPDATED 4/08/2020 to include UTF-32LE detection and return correct encoding for UTF-32BE
/// <summary>
/// Determines a text file's encoding by analyzing its byte order mark (BOM).
/// Defaults to ASCII when detection of the text file's endianness fails.
/// </summary>
/// <param name="filename">The text file to analyze.</param>
/// <returns>The detected encoding.</returns>
public static Encoding GetEncoding(string filename)
{
// Read the BOM
var bom = new byte[4];
using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read))
{
file.Read(bom, 0, 4);
}
// Analyze the BOM
if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;
if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;
if (bom[0] == 0xff && bom[1] == 0xfe && bom[2] == 0 && bom[3] == 0) return Encoding.UTF32; //UTF-32LE
if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return new UTF32Encoding(true, true); //UTF-32BE
// We actually have no idea what the encoding is if we reach this point, so
// you may wish to return null instead of defaulting to ASCII
return Encoding.ASCII;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With