Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set request encoding in Tomcat?

Tags:

I have a problem in my Java webapp.

Here is the code in index.jsp:

<%@page contentType="text/html" pageEncoding="UTF-8" %> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"    "http://www.w3.org/TR/html4/loose.dtd">  <% request.setCharacterEncoding("UTF-8"); response.setCharacterEncoding("UTF-8"); %>  <html>     <head>         <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">         <title>JSP Page</title>     </head>     <body>         <h1>Hello World!</h1>          <form action="index.jsp" method="get">             <input type="text" name="q"/>         </form>          Res: <%= request.getParameter("q") %>     </body> </html> 

When I wireshark a request, my browser sends this header:

GET /kjd/index.jsp?q=%C3%A9 HTTP/1.1\r\n ... Accept-Charset: UTF-8,*\r\n 

And the Tomcat server returns me this:

Content-Type: text/html;charset=UTF-8\r\n 

But if I send "é"(%C3%A9 in UTF-8) in my form, "é" is displayed instead.

What I understand is that the browser sends an "é" encoded with UTF-8 (the %C3%A9).

But the server interpret this as ISO-8859-1. So the %C3 is decoded as à and %A9 as ©, and then sends back the response encoded in UTF-8.

In the code, the requests should be decoded with UTF-8:

request.setCharacterEncoding("UTF-8"); 

But, if I send this url:

http://localhost:8080/kjd/index.jsp?q=%E9 

the "%E9" is decocded with ISO-8859-1 and an "é" is displayed.

Why isn't this working? Why requests are decoded with ISO-8859-1?

I've tried it on Tomcat 6 and 7, and on Windows and Ubuntu.

like image 821
Guillaume Avatar asked Jul 29 '11 17:07

Guillaume


People also ask

What is the difference between UTF-8 and ISO 8859 1?

UTF-8 is a multibyte encoding that can represent any Unicode character. ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters. Both encode ASCII exactly the same way.

How do you encode a response in Java?

The given content type may include a character encoding specification, for example, text/html;charset=UTF-8. The response's character encoding is only set from the given content type if this method is called before getWriter is called. This method may be called repeatedly to change content type and character encoding.

How many requests per second can Tomcat handle?

The default installation of Tomcat sets the maximum number of HTTP servicing threads at 200. Effectively, this means that the system can handle a maximum of 200 simultaneous HTTP requests.

What is maxThreads in Tomcat?

By default, Tomcat sets maxThreads to 200, which represents the maximum number of threads allowed to run at any given time. You can also specify values for the following parameters: minSpareThreads : the minimum number of threads that should be running at all times.


2 Answers

The request.setCharacterEncoding("UTF-8"); only sets the encoding of the request body (which is been used by POST requests), not the encoding of the request URI (which is been used by GET requests).

You need to set the URIEncoding attribute to UTF-8 in the <Connector> element of Tomcat's /conf/server.xml to get Tomcat to parse the request URI (and the query string) as UTF-8. This indeed defaults to ISO-8859-1. See also the Tomcat HTTP Connector Documentation.

<Connector ... URIEncoding="UTF-8"> 

or to ensure that the URI is parsed using the same encoding as the body1:

<Connector ... useBodyEncodingForURI="true"> 

See also:

  • Unicode - How to get the characters right? - JSP/Servlet request

1 From Tomcat's documentation (emphasis mine):

This setting is present for compatibility with Tomcat 4.1.x, where the encoding specified in the contentType, or explicitly set using Request.setCharacterEncoding method was also used for the parameters from the URL. The default value is false.


Please get rid of those scriptlets in your JSP. The request.setCharacterEncoding("UTF-8"); is called at the wrong moment. It would be too late whenever you've properly used a Servlet to process the request. You'd rather like to use a filter for this. The response.setCharacterEncoding("UTF-8"); part is already implicitly done by pageEncoding="UTF-8" in top of JSP.

I also strongly recommend to replace the old fashioned <%= request.getParameter("q") %> scriptlet by EL ${param.q}, or with JSTL XML escaping ${fn:escapeXml(param.q)} to prevent XSS attacks.

like image 127
BalusC Avatar answered Sep 21 '22 10:09

BalusC


you just need to uncomment below portion of code in conf/web.xml (Tomcat server web.xml) that filter all request and convert into UTF-8.

 <!-- A filter that sets character encoding that is used to decode -->  <!-- parameters in a POST request -->  <filter>         <filter-name>setCharacterEncodingFilter</filter-name>         <filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>         <init-param>             <param-name>encoding</param-name>             <param-value>UTF-8</param-value>         </init-param>  </filter>    <!-- The mapping for the Set Character Encoding Filter -->   <filter-mapping>         <filter-name>setCharacterEncodingFilter</filter-name>         <url-pattern>/*</url-pattern>   </filter-mapping> 

that's it. work fine in tomcat

like image 36
Divyesh Kanzariya Avatar answered Sep 18 '22 10:09

Divyesh Kanzariya