How to read an ANSI encoded file containing special characters

Tags:

I'm writing a TFS Checkin policy, which checks if our source files containing our file header.

My problem is, that our file header contains a special character "©" and unfortunately some of our source files are encoded in ANSI. So if I read these files in the policy, the string looks like this "Copyright � 2009".

string content = File.ReadAllText(pendingChange.LocalItem);

332

asked Sep 16 '09 10:09

Enyra

2 Answers

Use Encoding.Default:

string content = File.ReadAllText(pendingChange.LocalItem, Encoding.Default);

You should be aware, however, that that reads it using the system default encoding - which may not be the same as the encoding of the file. There's no single encoding called ANSI, but usually when people talk about "the ANSI encoding" they mean Windows Code Page 1252 or whatever their box happens to use.

Your code will be more robust if you can find out the exact encoding used.

answered Sep 20 '22 11:09

Jon Skeet

It would seem sensible if you going to have such policies that you would also have team agreed standard encoding. To be honest, I can't see why any team would use an encoding other than "Unicode (UtF-8 with signature) - Codepage 65001" (except perhaps for ASPX pages with significant non-latin static content but even then I can't see how it would be a big deal to use UTF-8).

Assuming you still want to allow mixed encodings then you next need a way to determine which encoding a file was save in so you know which encoding to pass to ReadAllText. Its not easy to determine this from the file however using Encoding.Default is likely to work ok. Since its most likely you have just 2 encodings to deal with, the VS (UTF-8 with signature) and a common ANSI encoding used by you machines (probably Windows-1252).

Hence using

 string content = File.ReadAllText(pendingChange.LocalItem, Encoding.Default);

will work. (As I see Jon has already posted). This works because when the UTF-8 BOM (which is what VS means by the term "signature") is present at the start of the file the supplied encoding parameter is ignored and UTF-8 is used anyway. Hence where the file is saved using UTF-8 you get correct results and where ANSI is used you are most likely also to get correct results.

BTW if you are processing file headers wouldn't ReadAllLines make things easier?.

answered Sep 20 '22 11:09

AnthonyWJones

Related questions
                            
                                Unit Testing with functions that return random results
                            
                                Does C# 7.0 work for .NET 4.5?
                            
                                How do you implement GetHashCode for structure with two string, when both strings are interchangeable
                            
                                No MediaTypeFormatter is available to read an object of type 'String' from content with media type 'text/plain'
                            
                                Debug.WriteLine in release build
                            
                                Is there some way to work with git using .NET application? [closed]
                            
                                Entity Framework Stored Procedure Table Value Parameter
                            
                                ModelState.IsValid even when it should not be?
                            
                                Get sum of two columns in one LINQ query
                            
                                Global test initialize method for MSTest
                            
                                .Include() vs .Load() performance in EntityFramework
                            
                                Autofac - InstancePerHttpRequest vs InstancePerLifetimeScope
                            
                                Capture the Screen into a Bitmap
                            
                                C# guid and SQL uniqueidentifier
                            
                                Resolving 'The specified string is not in the form required for a subject.'
                            
                                Error: The name 'ConfigurationManager' does not exist in the current context
                            
                                How to check programmatically if a type is a struct or a class?
                            
                                Unit testing ASP.Net MVC Authorize attribute to verify redirect to login page
                            
                                Why is .Contains slow? Most efficient way to get multiple entities by primary key?
                            
                                Add an image in a WPF button

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to read an ANSI encoded file containing special characters

Tags:

c#

encoding

ansi

Enyra

People also ask

2 Answers

Jon Skeet

AnthonyWJones

Recent Activity

Donate For Us