Suppose I have:
<a href="http://www.yahoo.com/" target="_yahoo" title="Yahoo!™" onclick="return gateway(this);">Yahoo!</a> <script type="text/javascript"> function gateway(lnk) { window.open(SERVLET + '?external_link=' + encodeURIComponent(lnk.href) + '&external_target=' + encodeURIComponent(lnk.target) + '&external_title=' + encodeURIComponent(lnk.title)); return false; } </script>
I have confirmed external_title
gets encoded as Yahoo!%E2%84%A2
and passed to SERVLET
. If in SERVLET
I do:
Writer writer = response.getWriter(); writer.write(request.getParameter("external_title"));
I get Yahoo!â„¢ in the browser. If I manually switch the browser character encoding to UTF-8, it changes to Yahoo!TM (which is what I want).
So I figured the encoding I was sending to the browser was wrong (it was Content-type: text/html; charset=ISO-8859-1
). I changed SERVLET
to:
response.setContentType("text/html; charset=utf-8"); Writer writer = response.getWriter(); writer.write(request.getParameter("external_title"));
Now the browser character encoding is UTF-8, but it outputs Yahoo!⢠and I can't get the browser to render the correct character at all.
My question is: is there some combination of Content-type
and/or new String(request.getParameter("external_title").getBytes(), "UTF-8");
and/or something else that will result in Yahoo!TM appearing in the SERVLET
output?
You are nearly there. EncodeURIComponent correctly encodes to UTF-8, which is what you should always use in a URL today.
The problem is that the submitted query string is getting mutilated on the way into your server-side script, because getParameter() uses ISO-8559-1 instead of UTF-8. This stems from Ancient Times before the web settled on UTF-8 for URI/IRI, but it's rather pathetic that the Servlet spec hasn't been updated to match reality, or at least provide a reliable, supported option for it.
(There is request.setCharacterEncoding in Servlet 2.3, but it doesn't affect query string parsing, and if a single parameter has been read before, possibly by some other framework element, it won't work at all.)
So you need to futz around with container-specific methods to get proper UTF-8, often involving stuff in server.xml. This totally sucks for distributing web apps that should work anywhere. For Tomcat see https://cwiki.apache.org/confluence/display/TOMCAT/Character+Encoding and also What's the difference between "URIEncoding" of Tomcat, Encoding Filter and request.setCharacterEncoding.
I got the same problem and solved it by decoding Request.getQueryString()
using URLDecoder(), and after extracting my parameters.
String[] Parameters = URLDecoder.decode(Request.getQueryString(), 'UTF-8') .splitat('&');
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With