How to use ReadAllText when file encoding unknown

Tags:

Im reading a file with ReadAllText

    String[] values = File.ReadAllText(@"c:\\c\\file.txt").Split(';');

    int i = 0;

    foreach (String s in values)
    {
        System.Console.WriteLine("output: {0} {1} ", i, s);
        i++;
    }

If I try to read some files I get sometimes the the wrong character back (for ÖÜÄÀ...). The output is like '?', its because there is some trouble with the encoding:

output: 0 TEST
output: 1 A??O?

One solution would be to set the encoding in ReadAllText, lets say something like ReadAllText(@"c:\\c\\file.txt", Encoding.UTF8) that could fix the problem. But what if I would still get '?' as output? What if I dont know the encoding of the file? And what if every single file got a different encoding? What would be the best way to do it with c#? Thank you

584

asked May 25 '12 11:05

sabisabi

2 Answers

The only way to reliably do this is to look for byte order marks at the start of the text file. (This blob more generally represents the endianness of character encoding used, but also the encoding - e.g. UTF8, UTF16, UTF32). Unfortunately, this method only works for Unicode-based encodings, and nothing before that (for which much less reliable methods must be used).

The StreamReader type supports detecting these marks to determine the encoding - you simply need to pass a flag to the parameter as such:

new System.IO.StreamReader("path", true)

You can then check the value of stremReader.CurrentEncoding to determine the encoding used by the file. Note however that if no byte encoding marks exist, then CurrentEncoding will default to Encoding.Default.

Refer codeproject solution to detect encoding

answered Oct 23 '22 05:10

Romil Kumar Jain

You have to check file encoding first. try this

System.Text.Encoding enc = null; 
System.IO.FileStream file = new System.IO.FileStream(filePath, 
    FileMode.Open, FileAccess.Read, FileShare.Read); 
if (file.CanSeek) 
{ 
    byte[] bom = new byte[4]; // Get the byte-order mark, if there is one 
    file.Read(bom, 0, 4); 
    if ((bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) || // utf-8 
        (bom[0] == 0xff && bom[1] == 0xfe) || // ucs-2le, ucs-4le, and ucs-16le 
        (bom[0] == 0xfe && bom[1] == 0xff) || // utf-16 and ucs-2 
        (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff)) // ucs-4 
    { 
        enc = System.Text.Encoding.Unicode; 
    } 
    else 
    { 
        enc = System.Text.Encoding.ASCII; 
    } 

    // Now reposition the file cursor back to the start of the file 
    file.Seek(0, System.IO.SeekOrigin.Begin); 
} 
else 
{ 
    // The file cannot be randomly accessed, so you need to decide what to set the default to 
    // based on the data provided. If you're expecting data from a lot of older applications, 
    // default your encoding to Encoding.ASCII. If you're expecting data from a lot of newer 
    // applications, default your encoding to Encoding.Unicode. Also, since binary files are 
    // single byte-based, so you will want to use Encoding.ASCII, even though you'll probably 
    // never need to use the encoding then since the Encoding classes are really meant to get 
    // strings from the byte array that is the file. 

    enc = System.Text.Encoding.ASCII; 
}

answered Oct 23 '22 05:10

Md Kamruzzaman Sarker

Related questions
                            
                                Setting a Custom Attribute on a list item in an HTML Select Control (.NET/C#)
                            
                                MongoDb C# GeoNear Query Construction
                            
                                How to make BackgroundWorker ProgressChanged events execute in sequence?
                            
                                How to expand first level children only of Treeview
                            
                                When/How Does My .NET Application Use Its App.Config File?
                            
                                .NET Built-in AVL-Tree?
                            
                                Implicit method group conversion gotcha
                            
                                Why Does ParameterizedThreadStart Only Allow Object Parameter?
                            
                                Web browser control: How to capture document events?
                            
                                Why isn't string concatenation automatically converted to StringBuilder in C#? [duplicate]
                            
                                IIS 7.x, adding an HTTPS-enabled site: SiteCollection.Add(string, string, string, byte[]) overload
                            
                                Delete user in active directory using c#
                            
                                Programmatically manage Windows Firewall
                            
                                Usage of binding to constants and binding to types in scopes with Ninject
                            
                                Week of Year C# Datetime
                            
                                Wininet InternetGetCookie gets empty cookie data
                            
                                How to check if a web service is up and running without using ping?
                            
                                Different return types for ASP.NET Web API
                            
                                How do I install and setup RESTSharp without NuGet?
                            
                                App.config for Xunit

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use ReadAllText when file encoding unknown

Tags:

c#

.net

encoding

sabisabi

People also ask

2 Answers

Romil Kumar Jain

Md Kamruzzaman Sarker

Recent Activity

Donate For Us