I've visited each one of the questions about UTF-8 encoding in HTML and nothing seems to be making it work like expected.
I added the meta
tag : nothing changed.
I added the accept-charset
attribute in form
: nothing changed.
<%@ page pageEncoding="UTF-8" %>
<%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
<title>Editer les sous-titres</title>
</head>
<body>
<form method="post" action="/Subtitlor/edit" accept-charset="UTF-8">
<h3 name="nameOfFile"><c:out value="${ nameOfFile }"/></h3>
<input type="hidden" name="nameOfFile" id="nameOfFile" value="${ nameOfFile }"/>
<c:if test="${ !saved }">
<input value ="Enregistrer le travail" type="submit" style="position:fixed; top: 10px; right: 10px;" />
</c:if>
<a href="/Subtitlor/" style="position:fixed; top: 50px; right: 10px;">Retour à la page d'accueil</a>
<c:if test="${ saved }">
<div style="position:fixed; top: 90px; right: 10px;">
<c:out value="Travail enregistré dans la base de donnée"/>
</div>
</c:if>
<table border="1">
<c:if test="${ !saved }">
<thead>
<th style="weight:bold">Original Line</th>
<th style="weight:bold">Translation</th>
<th style="weight:bold">Already translated</th>
</thead>
</c:if>
<c:forEach items="${ subtitles }" var="line" varStatus="status">
<tr>
<td style="text-align:right;"><c:out value="${ line }" /></td>
<td><input type="text" name="line${ status.index }" id="line${ status.index }" size="35" /></td>
<td style="text-align:right"><c:out value="${ lines[status.index].content }"/></td>
</tr>
</c:forEach>
</table>
</form>
</body>
</html>
for (int i = 0 ; i < 2; i++){
System.out.println(request.getParameter("line"+i));
}
Et ton père et sa soeur
Il ne sera jamais parti.
Tip: The first 128 characters of Unicode (which correspond one-to-one with ASCII) are encoded using a single octet with the same binary value as ASCII, making valid ASCII text valid UTF-8-encoded Unicode as well. HTML 4 supports UTF-8. HTML 5 supports both UTF-8 and UTF-16!
The character encoding should be specified for every HTML page, either by using the charset parameter on the Content-Type HTTP response header (e.g.: Content-Type: text/html; charset=utf-8 ) and/or using the charset meta tag in the file.
UTF-8 is an 8-bit variable width encoding. The first 128 characters in the Unicode, when represented with UTF-8 encoding have the representation as the characters in ASCII.
UTF-8 Encoding in Notepad (Windows)Click File in the top-left corner of your screen. In the dialog which appears, select the following options: In the "Save as type" drop-down, select All Files. In the "Encoding" drop-down, select UTF-8.
I added the
meta
tag : nothing changed.
It indeed doesn't have any effect when the page is served over HTTP instead of e.g. from local disk file system (i.e. the page's URL is http://...
instead of e.g. file://...
). In HTTP, the charset in HTTP response header will be used. You've already set it as below:
<%@page pageEncoding="UTF-8"%>
This will not only write out the HTTP response using UTF-8, but also set the charset
attribute in the Content-Type
response header.
This one will be used by the webbrowser to interpret the response and encode any HTML form params.
I added the
accept-charset
attribute inform
: nothing changed.
It has only effect in Microsoft Internet Explorer browser. Even then it is doing it wrongly. Never use it. All real webbrowsers will instead use the charset
attribute specified in the Content-Type
header of the response. Even MSIE will do it the right way as long as you do not specify the accept-charset
attribute. As said before, you have already properly set it via pageEncoding
.
Get rid of both the meta
tag and accept-charset
attribute. They do not have any useful effect and they will only confuse yourself in long term and even make things worse when enduser uses MSIE. Just stick to pageEncoding
. Instead of repeating the pageEncoding
over all JSP pages, you could also set it globally in web.xml
as below:
<jsp-config>
<jsp-property-group>
<url-pattern>*.jsp</url-pattern>
<page-encoding>UTF-8</page-encoding>
</jsp-property-group>
</jsp-config>
As said, this will tell the JSP engine to write HTTP response output using UTF-8 and set it in the HTTP response header too. The webbrowser will use the same charset to encode the HTTP request parameters before sending back to server.
Your only missing step is to tell the server that it must use UTF-8 to decode the HTTP request parameters before returning in getParameterXxx()
calls. How to do that globally depends on the HTTP request method. Given that you're using POST method, this is relatively easy to achieve with the below servlet filter class which automatically hooks on all requests:
@WebFilter("/*")
public class CharacterEncodingFilter implements Filter {
@Override
public void init(FilterConfig config) throws ServletException {
// NOOP.
}
@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
request.setCharacterEncoding("UTF-8");
chain.doFilter(request, response);
}
@Override
public void destroy() {
// NOOP.
}
}
That's all. In Servlet 3.0+ (Tomcat 7 and newer) you don't need additional web.xml
configuration.
You only need to keep in mind that it's very important that setCharacterEncoding()
method is called before the POST request parameters are obtained for the first time using any of getParameterXxx()
methods. This is because they are parsed only once on first access and then cached in server memory.
So e.g. below sequence is wrong:
String foo = request.getParameter("foo"); // Wrong encoding.
// ...
request.setCharacterEncoding("UTF-8"); // Attempt to set it.
String bar = request.getParameter("bar"); // STILL wrong encoding!
Doing the setCharacterEncoding()
job in a servlet filter will guarantee that it runs timely (at least, before any servlet).
In case you'd like to instruct the server to decode GET (not POST) request parameters using UTF-8 too (those parameters you see after ?
character in URL, you know), then you'd basically need to configure it in the server end. It's not possible to configure it via servlet API. In case you're using for example Tomcat as server, then it's a matter of adding URIEncoding="UTF-8"
attribute in <Connector>
element of Tomcat's own /conf/server.xml
.
In case you're still seeing Mojibake in the console output of System.out.println()
calls, then chances are big that the stdout itself is not configured to use UTF-8. How to do that depends on who's responsible for interpreting and presenting the stdout. In case you're using for example Eclipse as IDE, then it's a matter of setting Window > Preferences > General > Workspace > Text File Encoding to UTF-8.
Let me start by saying the universal fact which we all know that computer doesn't understand anything but bits - 0's and 1's.
Now, when you are submitting a HTML form over HTTP and values travel over the wire to reach destination server then essentially a whole lot of bits - 0's and 1's are being passed over.
An analogy for this can be - I am sending a letter to you and telling you whether it is written in English or French or Dutch, so that you will get exact message as I intended to send you. And while replying to me you will also mention in which language I should read.
Important take away is that the fact that when data is leaving the client it will be encoded and same will be decoded at server side, and vice-versa. If you do not specify anything then content will be encoded as per application/x-www-form-urlencoded before leaving from client side to server side.
Reading warm up is important. There are couple of things you need to make sure to get the expected results.
To ensure this, there are several ways talked about but I will say use HTTP Accept-Charset request-header field. As per your provided code snippet you are already using and using it correctly so you are good from that front.
There are people who will say that do not use this or it is not implemented but I would very humbly disagree with them. Accept-Charset
is part of HTTP 1.1 specification (I have provided link) and browser implementing HTTP 1.1 will implement the same. They may also argue that use Accept request-header field's "charset" attribute but
I am providing you all data and facts, not just words, but still if you are not satisfied then do following tests using different browsers.
accept-charset="ISO-8859-1"
in your HTML form and POST/GET form having Chinese or advanced French characters to server.You will see that none of times you were able to see the expected characters at server. But if you will use same encoding scheme then you will see expected character. So, browsers do implements accept-charset
and its effect kicks-in.
There are hell lot of ways talked about that you can do to achieve this (sometime some configuration may be required based on specific scenario but below solves 95% cases and holds good for your case as well). For example:
setCharacterEncoding
on request and response-Dfile.encoding=utf8
etc. Read more here My favorite is first one and will solve your problem as well - "Character Encoding Filter", because of below reasons:
You can do following to implement your own character encoding filter. If you are using some framework like Springs etc. then you need not to write you own class but just do the configuration in web.xml
Core logic in below is very similar to what Spring does, apart from a lot of dependency, bean aware thing they do.
web.xml (configuration)
<filter>
<filter-name>EncodingFilter</filter-name>
<filter-class>
com.sks.hagrawal.EncodingFilter
</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>EncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
EncodingFilter (character encoding implementation class)
public class EncodingFilter implements Filter {
private String encoding = "UTF-8";
private boolean forceEncoding = false;
public void doFilter(ServletRequest request, ServletResponse response, FilterChain filterChain) throws IOException, ServletException {
request.setCharacterEncoding(encoding);
if(forceEncoding){ //If force encoding is set then it means that set response stream encoding as well ...
response.setCharacterEncoding(encoding);
}
filterChain.doFilter(request, response);
}
public void init(FilterConfig filterConfig) throws ServletException {
String encodingParam = filterConfig.getInitParameter("encoding");
String forceEncoding = filterConfig.getInitParameter("forceEncoding");
if (encodingParam != null) {
encoding = encodingParam;
}
if (forceEncoding != null) {
this.forceEncoding = Boolean.valueOf(forceEncoding);
}
}
@Override
public void destroy() {
// TODO Auto-generated method stub
}
}
This is essentially same code done in character encoding filter but instead of doing in filter, you are doing it in your servlet or controller class.
Idea is again to use request.setCharacterEncoding("UTF-8");
to set the encoding of http request stream before you start reading the http request stream.
Try below code, and you will see that if you are not using some sort of filter to set the encoding on request object then first log will be NULL while second log will be "UTF-8".
System.out.println("CharacterEncoding = " + request.getCharacterEncoding());
request.setCharacterEncoding("UTF-8");
System.out.println("CharacterEncoding = " + request.getCharacterEncoding());
Below is important excerpt from setCharacterEncoding Java docs. Another thing to note is you should provide a valid encoding scheme else you will get UnsupportedEncodingException
Overrides the name of the character encoding used in the body of this request. This method must be called prior to reading request parameters or reading input using getReader(). Otherwise, it has no effect.
Wherever needed I have tried best to provide you official links or StackOverflow accepted bounty answers, so that you can build trust.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With