I'm calling File.ReadAllText()
in a program designed to format some files that I have.
Some of these files contain the ®
(174) symbol. However, when the text is being read, the returned string contains �
(65533) symbols where the ®
(174) should be.
What would cause this and how can I fix it?
Most likely the file contains a different encoding than the default. If you know it, you can specify it using the File.ReadAllText Method (String, Encoding) override.
Code sample:
string readText = File.ReadAllText(path, Encoding.Default); // <-- change the encoding to whatever the encoding really is
If you DON'T know the encoding, see this previous SO question: How to use ReadAllText when file encoding unknown
You need to specify the encoding when you call File.ReadAllText
, unless the file is actually in UTF-8, which it sounds like it's not. (Basically the one-parameter overload is equivalent to passing in UTF-8 as the second argument. It will also detect UTF-32 with an appropriate byte-order mark, I believe.)
The first thing is to work out which encoding it is in (e.g. ISO-8859-1 - but you need to check this) and then pass that as a second argument.
For example:
Encoding isoLatin1 = Encoding.GetEncoding(28591);
string text = File.ReadAllText(path, isoLatin1);
It's always important that you know what encoding binary data is using before you try to read it as text. That's true for files, network streams, anything.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With