Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Visual Studio encoding problems

I have problems with files encoding in Visual Studio 2008. While compiling I'm getting such errors:

alt text

When I'm trying to open file where particular error occures, encoding window appears:

alt text

By defualt auto-detect is set. When I change encoding option to UTF-8, everything works. If I open each problematic file in my project using UTF-8 encoding, project starts to compile. The problem is I have too many files and there is ridiculous to open each file and set encoding to UTF-8. Is there any way to make this in a quick way ?

My VS settings are:

alt text

I'm using Windows Server 2008 R2.

UPDATE:

For Hans Passant and Noah Richards. Thanks for interaction. I recently changed my operating system so everything is fresh. I've also downloaded fresh solution from source control.

In OS regional settings I've changed system locale to Polish(Poland):

alt text

In VS I've changed international settings to the same as windows:

alt text

The problem is still not solved.

When I open some .cs files using auto-detection for encoding, and then check Files -> Advanced Save Options..., some of this .cs files have codepage 1250:

alt text

but internally looks following:

alt text

It is wired, because when I check properties of such particular files in source control, they seems to have UTF-8 encoding set:

alt text

I don't understand this mismatch.

All other files have UTF-8 encoding:

alt text

and opens correctly. I have basically no idea what is going wrong because as far as I know my friend has the same options set as me, and the same project compiles for him correctly. But so far he happily hasn't encountered encoding issues.

like image 615
jwaliszko Avatar asked Nov 09 '10 10:11

jwaliszko


2 Answers

That uppercase A with circumflex tells me that the file is UTF-8 (if you look with a hex editor you will probably see that the bytes are C2 A0). That is a non-breaking space in UTF-8.

Visual Studio does not detect the encoding because (most likely) there are not enough high-ASCII characters in the file to help with a reliable detection.

Also, there is no BOM (Byte Order Mark). That would help with the detection (this is the "signature" in the "UTF-8 with signature" description).

What you can do: add BOM to all the files that don't have one. How to add? Make a file with a BOM only (empty file in Notepad, Save As, select UTF-8 as encoding). It will be 3 bytes long (EF BB BF). You can copy that at the beginning of each file that is missing the BOM:

   copy /b/v BOM.txt + YourFile.cs YourFile_Ok.cs
   ren YourFile.cs YourFile_Org.cs
   ren YourFile_Ok.cs YourFile.cs

Make sure there is a + between the name of the BOM file and the one of the original file.

Try it on one or two files, and if it works you can create some batch file to do that. Or a small C# application (since you are a C# programmer), that can detect if the file already has a BOM or not, so that you don't add it twice. Of course, you can do this in almost anything, from Perl to PowerShell to C++ :-)

like image 91
Mihai Nita Avatar answered Oct 20 '22 22:10

Mihai Nita


Once you've opened the files in UTF-8 mode, can you try changing the Advanced Save Options for the file and saving it (as UTF-8 with signature, if you think these files should be UTF-8)?

The auto-detect encoding detection is best-effort, so it's likely that something in the file is causing it to be detected as something other than UTF-8, such as having only ASCII characters in the first kilobyte of the file, or having a BOM that indicates the file is something other than UTF-8. Re-saving the file as UTF-8 with signature should (hopefully) correct that.

If it continues happening after that, let me know, and we can try to track down what is causing them to be created/saved like that in the first place.

like image 33
Noah Richards Avatar answered Oct 20 '22 22:10

Noah Richards