Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Unmappable character for encoding UTF-8" error

I'm getting a compile error at the following method.

public static boolean isValidPasswd(String passwd) {     String reg = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$";     return Pattern.matches(reg, passwd); } 
 at Utility.java:[76,74] unmappable character for  enoding UTF-8. 74th character is' " ' 

How can I fix this? Thanks.

like image 721
Ravi Avatar asked Feb 14 '11 17:02

Ravi


People also ask

How to fix unmappable character for encoding UTF-8?

In eclipse try to go to file properties ( Alt + Enter ) and change the Resource → ' Text File encoding ' → Other to UTF-8 . Reopen the file and check there will be junk character somewhere in the string/file. Remove it. Save the file.

Can ASCII files be read as UTF-8?

ASCII is a subset of UTF-8. You can read any ASCII-encoded document as UTF-8, and it will work. ASCII only uses 7 bits, and UTF-8 uses the unused eight bit to mark non-ASCII code units.


1 Answers

You have encoding problem with your sourcecode file. It is maybe ISO-8859-1 encoded, but the compiler was set to use UTF-8. This will results in errors when using characters, which will not have the same bytes representation in UTF-8 and ISO-8859-1. This will happen to all characters which are not part of ASCII, for example ¬ NOT SIGN.

You can simulate this with the following program. It just uses your line of source code and generates a ISO-8859-1 byte array and decode this "wrong" with UTF-8 encoding. You can see at which position the line gets corrupted. I added 2 spaces at your source code to fit position 74 to fit this to ¬ NOT SIGN, which is the only character, which will generate different bytes in ISO-8859-1 encoding and UTF-8 encoding. I guess this will match indentation with the real source file.

 String reg = "      String reg = \"^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$\";";  String corrupt=new String(reg.getBytes("ISO-8859-1"),"UTF-8");  System.out.println(corrupt+": "+corrupt.charAt(74));  System.out.println(reg+": "+reg.charAt(74));      

which results in the following output (messed up because of markup):

String reg = "^(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[~#;:?/@&!"'%*=�.,-])(?=[^\s]+$).{8,24}$";: �

String reg = "^(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[~#;:?/@&!"'%*=¬.,-])(?=[^\s]+$).{8,24}$";: ¬

See "live" at https://ideone.com/ShZnB

To fix this, save the source files with UTF-8 encoding.

like image 163
Michael Konietzka Avatar answered Sep 19 '22 17:09

Michael Konietzka