Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 Strings getting scrambled by Restlet on GAE

I have a simple Restlet service hosted on AppEngine. This performs basic CRUD operations with strings and is working well with all sorts of UTF-8 characters when I test it with curl (for all the verbs).

This is consumed by a simple restlet client hosted in a servlet on another AppEngine app:

// set response type
resp.setContentType("application/json");
// Create the client resource
ClientResource resource = new ClientResource(Messages.SERVICE_URL + "myentity/id");
// Customize the referrer property
resource.setReferrerRef("myapp");
// Write the response
resource.get().write(resp.getWriter());

The above is pretty much all I have in the servlet. Very plain.

The servlet is invoked via jquery ajax, and the json that I get back is well formed and everything, but the problem is that UTF-8 encoded strings are coming back scrambled, for example: Université de Montréal becomes Universit?? de Montr??al.

I tried adding this line in the servlet (before everything else):

resp.setCharacterEncoding("UTF-8");

But the only diference is that instead of getting ?? I get Universitᅢᄅ de Montrᅢᄅal (I don't even know what kind of characters those are, asian I suppose).

I am 100% sure the restlet service is OK, because other than debugging it line by line I am able to test it from cmd line with curl and it's returning well formed strings.

By looking at the http header of the response from firefox (when calling the servlet via javascript) I can see the encoding is indeed UTF-8, as expected. After hours of struggling reading every possible related article I came across this restlet discussion and noticed that indeed I do have Transfer-Encoding: chunked on the http header of the response. I tried the proposed solutions (override ClientResource.toRepresentation, didn't do any good so I tried restlet 2.1 as susggested with ClientResource.setRe​questEntityBuffering​(true), no luck there either) but I am not convinced my issue is related to Transfer-Encoding: chunked at all.

At this point I am out of ideas, and I would really appreciate any suggestions! O_o

UPDATE:

I tried doing a manual GET with a classic UrlConnection and the string is coming back alright:

URL url = new URL(Messages.SERVICE_URL + "myentity/id");
URLConnection conn = url.openConnection();
InputStream is = conn.getInputStream();

StringWriter writer = new StringWriter();
IOUtils.copy(is, writer, "UTF-8");

resp.getWriter().print(writer.toString()); 

So much for being all RESTful and fancy ...but still I have no clue why the original version doesn't work! :/

like image 611
JohnIdol Avatar asked Dec 31 '11 06:12

JohnIdol


People also ask

Should I always use UTF-8?

When you need to write a program (performing string manipulations) that needs to be very very fast and that you're sure that you won't need exotic characters, may be UTF-8 is not the best idea. In every other situations, UTF-8 should be a standard. UTF-8 works well on almost every recent software, even on Windows.

What are UTF-8 strings?

UTF-8 is a Unicode character encoding method. This means that UTF-8 takes the code point for a given Unicode character and translates it into a string of binary. It also does the reverse, reading in binary digits and converting them back to characters.


1 Answers

I tried doing a manual GET with a classic UrlConnection and the string is coming back alright:

URL url = new URL(Messages.SERVICE_URL + "myentity/id");
URLConnection conn = url.openConnection();
InputStream is = conn.getInputStream();

StringWriter writer = new StringWriter();
IOUtils.copy(is, writer, "UTF-8");

resp.getWriter().print(writer.toString());

So much for being all RESTful and fancy ...but still I have no clue why the original version doesn't work! :/

like image 187
JohnIdol Avatar answered Oct 04 '22 00:10

JohnIdol