Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sending UTF-8 values in HTTP headers results in Mojibake

i want to send arabic data from servlet using HTTPServletResponse to client

i am trying this

response.setCharacterEncoding("UTF-8");
response.setHeader("Info", arabicWord);

and i receive the word like this

String arabicWord = response.getHeader("Info");

in client(receiving) also tried this

byte[]d = response.getHeader("Info").getBytes("UTF-8");
arabicWord = new String(d);

but seems like there is no unicode because i receive strange english words,so please how can i send and receive arabic utf8 words?

like image 539
Totti Avatar asked Jun 26 '12 17:06

Totti


People also ask

Are HTTP headers UTF-8?

As per RFC5987 also, the character set and language encoding in HTTP headers must be ISO-8859-1 and UTF-8 character sets, both of which are basically 8-bit single byte encoded characters.

How do I set character encoding in HTTP header?

The HTTP Accept-Charset is a request type header. This header is used to indicate what character set are acceptable for the response from the server. The accept-charset header specifies the character encodings which are accepted by the client and this header also allows a user-agent to specify the charsets it supports.

Are HTTP headers encoded?

HTTP messages are encoded with ISO-8859-1 (which can be nominally considered as an enhanced ASCII version, containing umlauts, diacritic and other characters of West European languages). At the same time, the message body can use another encoding assigned in "Content-Type" header.

Can HTTP headers have special characters?

The name of the HTTP request header you want to set or remove can only contain: Alphanumeric characters: a - z and A - Z. The following special characters: - and _


1 Answers

HTTP headers doesn't support UTF-8. They officially support ISO-8859-1 only. See also RFC 2616 section 2:

Words of *TEXT MAY contain characters from character sets other than ISO- 8859-1 [22] only when encoded according to the rules of RFC 2047 [14].

Your best bet is to URL-encode and decode them.

response.setHeader("Info", URLEncoder.encode(arabicWord, "UTF-8"));

and

String arabicWord = URLDecoder.decode(response.getHeader("Info"), "UTF-8");

URL-encoding will transform them into %nn format which is perfectly valid ISO-8859-1. Note that the data sent in the headers may have size limitations. Rather send it in the response body instead, in plain text, JSON, CSV or XML format. Using custom HTTP headers this way is namely a design smell.

like image 106
BalusC Avatar answered Sep 28 '22 03:09

BalusC