Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing request parameters as UTF-8 encoded strings [duplicate]

I am creating a simple login page and I want to pass login and password parameters as UTF-8 encoded strings. As you can see in the code below, the first line is where I set encoding to UTF-8, but it seems this is pointless because it doesn't work. When I use login and password parameters with accents the result page receives strange characters.

How to set character encoding correctly in a way that works in all browsers?

<%@page contentType="text/html" pageEncoding="UTF-8"%> <!DOCTYPE html> <html>     <head>         <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">         <title>My Page</title>     </head>      <body>         <h1>Welcome to My Page</h1>          <form name="login" action="login.jsp" method="POST">             Login:<br/>             <input type="text" name="login" value="" /><br/>             Password:<br/>             <input type="password" name="password" value="" /><br/>             <br/>             <input type="submit" value="Login" /><br/>         </form>      </body> </html> 
like image 726
ceklock Avatar asked Jun 12 '12 18:06

ceklock


2 Answers

The pageEncoding only sets the response character encoding and the charset attribute of the HTTP Content-Type header. Basically, it tells the server to decode the characters produced by JSP as UTF-8 before sending it to the client and the header tells the client to encode them using UTF-8 and also to use it when any forms in the very same page is to be submitted back to the server. The contentType already defaults to text/html, so below is sufficient:

<%@page pageEncoding="UTF-8"%> 

The HTML meta tag is ignored when the page is served over HTTP. It's only been used when the page is by the client saved as a HTML file on local disk system and then opened by a file:// URI in browser.

In your particular case, the HTTP request body encoding is apparently not been set to UTF-8. The request body encoding needs to be set by ServletRequest#setCharacterEncoding() in the servlet or a filter before the first call on request.getXxx() is ever made in any servlet or filter involved in the request.

request.setCharacterEncoding("UTF-8"); String login = request.getParameter("login"); String password = request.getParameter("password"); // ... 

See also:

  • How to set request encoding in Tomcat?
  • Why does POST not honor charset, but an AJAX request does? tomcat 6
  • https://stackoverflow.com/questions/14177914/passing-turkish-char-from-form-to-java-class-with-struts2/
  • Unicode - How to get the characters right?
like image 192
BalusC Avatar answered Sep 27 '22 18:09

BalusC


Calling ServletRequest#setCharacterEncoding() will still fail in some cases.

If your container follows the servlet spec carefully (as does tomcat) it will be interpreting post parameters as ISO-8859-1 by default. This may garble UTF-8 characters (such as Japanese in the recent case I worked through) before they ever get to your code, especially if you have a servlet filter that inspects the request parameters with getParameter() or getParameters(). Those two methods force decoding of the parameters, and decoding is only ever done once.

Here's a link for how to get around this in Tomcat if you have filters that look at the request parameters. Folks will want to check the docs for their particular container.

http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q1

The key bit from that is:

Add

useBodyEncodingForURI="true" URIEncoding="UTF-8" 

to the Context element in Tomcat's server.xml and add

  <filter>     <filter-name>Character Encoding Filter</filter-name>     <filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>     <init-param>       <param-name>encoding</param-name>       <param-value>UTF-8</param-value>     </init-param>   </filter>   <filter-mapping>     <filter-name>Character Encoding Filter</filter-name>     <url-pattern>/*</url-pattern>   </filter-mapping> 

as before any filter that calls getParameter() or getParameters() in web.xml. I found that although the link above makes the two attributes to the context element seem like alternatives, the useBodyEncodingForURI one is absolutely necessary or tomcat won't set the encoding for the querystring. From Request.java in tomcat 7.0.42:

boolean useBodyEncodingForURI = connector.getUseBodyEncodingForURI(); if (enc != null) {     parameters.setEncoding(enc);     if (useBodyEncodingForURI) {         parameters.setQueryStringEncoding(enc);     } } else {     parameters.setEncoding         (org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING);     if (useBodyEncodingForURI) {         parameters.setQueryStringEncoding             (org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING);     } } 
like image 36
Gus Avatar answered Sep 27 '22 20:09

Gus