Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

encoding problem in servlet

I have a servlet which receive some parameter from the client ,then do some job. And the parameter from the client is Chinese,so I often got some invalid characters in the servet. For exmaple: If I enter

http://localhost:8080/Servlet?q=中文&type=test

Then in the servlet,the parameter of 'type' is correct(test),however the parameter of 'q' is not correctly encoding,they become invalid characters that can not parsed.

However if I enter the adderss bar again,the url will changed to :

http://localhost:8080/Servlet?q=%D6%D0%CE%C4&type=test

Now my servlet will get the right parameter of 'q'.

What is the problem?

UPDATE

BTW,it words well when I send the form with post. WHen I send them in the ajax,for example:

url="http://..q='中文',
xmlhttp.open("POST",url,true); 

Then the server side also get the invalid characters.

It seems that just when the Chinese character are encoded like %xx,the server side can get the right result.

That's to say http://.../q=中文 does not work, http://.../q=%D6%D0%CE%C4 work.

But why "http://www.google.com.hk/search?hl=zh-CN&newwindow=1&safe=strict&q=%E4%B8%AD%E6%96%87&btnG=Google+%E6%90%9C%E7%B4%A2&aq=f&aqi=&aql=&oq=&gs_rfai=" work? alt text

like image 650
hguser Avatar asked Apr 08 '26 03:04

hguser


2 Answers

Ensure that the encoding of the page with the form itself is also UTF-8 and ensure that the browser is instructed to read the page as UTF-8. Assuming that it's JSP, just put this in very top of the page to achieve that:

<%@ page pageEncoding="UTF-8" %>

Then, to process GET query string as UTF-8, ensure that the servletcontainer in question is configured to do so. It's unclear which one you're using, so here's a Tomcat example: set the URIEncoding attribute of the <Connector> element in /conf/server.xml to UTF-8.

<Connector URIEncoding="UTF-8">

For the case that you'd like to use POST, then you need to ensure that the HttpServletRequest is instructed to parse the POST request body using UTF-8.

request.setCharacterEncoding("UTF-8");

Call this before you access the first parameter. A Filter is the best place for this.

See also:

  • Unicode - How to get the characters right?
like image 182
BalusC Avatar answered Apr 09 '26 16:04

BalusC


Using non-ASCII characters as GET parameters (i.e. in URLs) is generally problematic. RFC 3986 recommends using UTF-8 and then percent encoding, but that's AFAIK not an official standard. And what you are using in the case where it works isn't UTF-8!

It would probably be safest to switch to POST requests.

like image 20
Michael Borgwardt Avatar answered Apr 09 '26 15:04

Michael Borgwardt



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!