Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 encoding with form post and Spring Controller

I am trying to submit a form, which has UTF8 characters inside it. The form looks like this:

<form id="workflowPersistForm" accept-charset="UTF-8" method="post" action="/workflow-next">>
  <input id="stateGlobal" type="hidden" value=" お問い合わせ" name="state">
</form>

My server is a spring based. My web.xml already has the Encoding Filter:

 <filter>
     <filter-name>EncodingFilter</filter-name>
     <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
     <init-param>
         <param-name>encoding</param-name>
         <param-value>UTF-8</param-value>
     </init-param>
     <init-param>
         <param-name>forceEncoding</param-name>
         <param-value>true</param-value>
     </init-param>
 </filter>

The problem is that the UTF-8 characters are getting messed up somewhere. I put a break point just at the start of controller, and the characters are messed up at that point itself. Also, if I generate UTF8 characters inside Controller, it gets rendered correctly in the browser. Just that on form post, the controller doesn't receive the characters properly.

Any idea what I might be doing wrong?

Edit: Looks like, in the new page data is not messed up, but its double encoded. I am unable to understand why it is double encoded.

Edit 2: When I change the form to get instead of post, everything works perfectly. I have no idea what post is breaking.

like image 591
Bulbasaur Avatar asked Apr 16 '13 05:04

Bulbasaur


3 Answers

Looks like browsers don’t send the charset as part of Content-Type in request headers (even when accept-charset on form is set) and Tomcat deals with body of such requests as Latin-1 ( http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q1 ).

So at a later point this might have been decoded as Latin-1 and encoded as UTF-8 resulting in garbled up characters.

Moving CharacterEncodingFilter to the top and forcing the encoding to be set as UTF-8 solved the problem.

like image 146
Bulbasaur Avatar answered Oct 16 '22 00:10

Bulbasaur


Do you have a filter-mapping entry in your web.xml for EncodingFilter?

<filter-mapping>
  <filter-name>EncodingFilter</filter-name>
  <url-pattern>*</url-pattern>
</filter-mapping>
like image 21
Shinichi Kai Avatar answered Oct 16 '22 02:10

Shinichi Kai


I would suggest you remove the CharacterEncodingFilter, which may itself be the cause of double encoding.

To debug the situtation, you should first check if the browser is posting the data correctly. Use Firebug (for Firefox) or developer tools on Chrome (F12)

Most likely, the problem is at the server side. Which server do you use? If you use Tomcat, you need to set the CharsetEncoding to UTF-8 on the Connector element in server.xml

Update 1:

It looks very likely that the problem is the forceEncoding that you are setting. As per the docs

This filter can either apply its encoding if the request does not already specify an encoding, or enforce this filter's encoding in any case ("forceEncoding"="true")

When you do a get, there is no encoding specified, so it makes sense that it works.

However when you do the POST, the encoding is already applied and then (it seems) is applied again because of the forceEncoding=true

like image 40
arahant Avatar answered Oct 16 '22 02:10

arahant