Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

warning C4819: How to find the character that has to be saved in unicode?

Tags:

I have seen the following warning recently through my VS2010.

Warning 21 warning C4819: The file contains a character that cannot be represented in the current code page (936). Save the file in Unicode format to prevent data loss

c:\boost\vs2010_boost1.49\include\boost\format\alt_sstream_impl.hpp 1 

Based on MSDN, the file contains a character that has to be saved in unicode format.

Question: I didn't touch those files myself. Is there a way that I can find those characters and correct it manually. In other words, I don't want to save the source file in unicode format.

Thank you

like image 343
q0987 Avatar asked May 08 '12 15:05

q0987


People also ask

What is the character Unicode?

Unicode is a universal character set, ie. a standard that defines, in one place, all the characters needed for writing the majority of living languages in use on computers. It aims to be, and to a large extent already is, a superset of all other character sets that have been encoded.

Is used to store Unicode characters?

Data types nchar, nvarchar, and long nvarchar are used to store Unicode data. They behave similarly to char, varchar, and long varchar character types respectively, except that each character in a Unicode type typically uses 16 bits.

What is the last character in Unicode?

Unicode is a character set. It is a superset of all the other character sets. In the version 6.0, Unicode has 1,114,112 code points (the last code point is U+10FFFF).

How many characters are there Unicode?

The maximum possible number of code points Unicode can support is 1,114,112 through seventeen 16-bit planes. Each plane can support 65,536 different code points. Among the more than one million code points that Unicode can support, version 4.0 curently defines 96,382 characters at plane 0, 1, 2, and 14.


2 Answers

You can use Notepad++ to find all Unicode characters in a file using a regular expression:

  1. Open your file in Notepad++.
  2. Ensure that you select UTF-8 from the Encoding menu.
  3. Open the search box (use CTRL-F or go to the Search menu and select Find...).
  4. Under Search Mode, select the radio button for Regular expression.
  5. Enter [^\x00-\x7F] in the Find what box and hit the Find Next button to see what you get.

After you find the Unicode character(s), you can remove/change them, change the encoding back to ANSI, and save the file.

You don't have to use Notepad++, of course. The RegEx will work in other text editors, e.g., Sublime Text.

like image 144
schellack Avatar answered Apr 05 '23 20:04

schellack


I met this problem in my project and tried to modify all non-unicode characters. But I had to give up and found a another way, as there were too many files with such problem (even though all of them are in comments).

Then I found a quick way to fix this by setting 'system locale'.

Control Panel -> Clock,Language,and Region -> Region and Language ->  Administrative -> Language for non-Unicode programs -> Change system locale -> English 

I think this could fix your problem if your 'system locale' is not English.

https://stackoverflow.com/a/37871883/3148107

like image 27
claymore Avatar answered Apr 05 '23 21:04

claymore