Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use UTF-8 with tomcat

Tomcat does not encode correctly String literals that contain unicode characters. The problem occurs at a Linux server but not on my development machine (Windows). It affects ONLY String literals (not Strings read from DB or from file!!!).

  • I have set the URIEncoding="utf-8" at the Connector tag (server.xml).
  • I have used setCharacterEncoding().
  • I cheched the stack trace (no filters that might set encoding).
  • I have set the LANG environment variable
  • I cheched the HTTP Headers and they are correct (Content-Type=text/plain;charset=utf-8)
  • I checked the encoding at the browser and it is correct (UTF-8)

Nothing of the above works. Any ideas on what I might be missing?

public class Test extends HttpServlet {

@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {

    resp.setCharacterEncoding("utf-8");
    resp.setContentType("text/plain;");

    Writer w = resp.getWriter();
    w.write("Μαλακία Latin"); //Some unicode characters
    w.close();
}

The above shows this at the browser. Îλληνικά Latin

like image 768
idrosid Avatar asked Mar 22 '12 12:03

idrosid


People also ask

What is the use of UTF-8 in Java?

UTF-8 represents a variable-width character encoding that uses between one and four eight-bit bytes to represent all valid Unicode code points. A code point can represent single characters, but also have other meanings, such as for formatting.

What is the difference between ISO 8859 1 and UTF-8?

UTF-8 is a multibyte encoding that can represent any Unicode character. ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters. Both encode ASCII exactly the same way.

Are Java strings UTF-8?

A Java String is internally always encoded in UTF-16 - but you really should think about it like this: an encoding is a way to translate between Strings and bytes.


2 Answers

You can force the encoding of files when javac reads them by passing in -encoding 'utf-8' or -encoding 'iso-8859-1' when compiling. Just make sure that it matches whatever encoding your .java files are actually encoded as.

http://docs.oracle.com/javase/6/docs/technotes/tools/windows/javac.html

-encoding encoding Set the source file encoding name, such as EUC-JP and UTF-8. If -encoding is not specified, the platform default converter is used.

like image 172
benmmurphy Avatar answered Sep 27 '22 20:09

benmmurphy


Try setting the file.encoding system property e.g. -Dfile.encoding=utf-8 on the Linux JVM command line

like image 43
Bruno Grieder Avatar answered Sep 27 '22 22:09

Bruno Grieder