Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 text is garbled when form is posted as multipart/form-data

People also ask

What encoding is multipart form data?

Multipart form data: The ENCTYPE attribute of <form> tag specifies the method of encoding for the form data. It is one of the two ways of encoding the HTML form. It is specifically used when file uploading is required in HTML form. It sends the form data to server in multiple parts because of large size of file.

Can UTF-8 handle all languages?

UTF-8 supports any unicode character, which pragmatically means any natural language (Coptic, Sinhala, Phonecian, Cherokee etc), as well as many non-spoken languages (Music notation, mathematical symbols, APL). The stated objective of the Unicode consortium is to encompass all communications.

What does UTF-8 mean on mail?

UTF-8 is a variable-width character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8-bit.

What is multipart form data in API?

Multipart/Form-Data is a popular format for REST APIs, since it can represent each key-value pair as a “part” with its own content type and disposition. Each part is separated by a specific boundary string, and we don't explicitly need Percent Encoding for their values.


I had the same problem using Apache commons-fileupload. I did not find out what causes the problems especially because I have the UTF-8 encoding in the following places: 1. HTML meta tag 2. Form accept-charset attribute 3. Tomcat filter on every request that sets the "UTF-8" encoding

-> My solution was to especially convert Strings from ISO-8859-1 (or whatever is the default encoding of your platform) to UTF-8:

new String (s.getBytes ("iso-8859-1"), "UTF-8");

hope that helps

Edit: starting with Java 7 you can also use the following:

new String (s.getBytes (StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);

Just use Apache commons upload library. Add URIEncoding="UTF-8" to Tomcat's connector, and use FileItem.getString("UTF-8") instead of FileItem.getString() without charset specified.

Hope this help.


I got stuck with this problem and found that it was the order of the call to

request.setCharacterEncoding("UTF-8");

that was causing the problem. It has to be called before any all call to request.getParameter(), so I made a special filter to use at the top of my filter chain.

https://rogerkeays.com/servletrequest-setcharactercoding-ignored


I had the same problem and it turned out that in addition to specifying the encoding in the Filter

request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");

it is necessary to add "acceptcharset" to the form

<form method="post" enctype="multipart/form-data" acceptcharset="UTF-8" > 

and run the JVM with

-Dfile.encoding=UTF-8

The HTML meta tag is not necessary if you send it in the HTTP header using response.setCharacterEncoding().