Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to send non-English unicode string using HTTP header?

Tags:

I am novice to HTTP related matters. My question is in iOS development, I would like to send a string using HTTP Header, so I'm using:

[httpRequest setValue:@"nonEnglishString" forHTTPHeaderField:@"customHeader"]; 

The receiving server is Python(Google App Engine), saving the string value in the db model as StringProperty using:

dataEntityInstance.nonEnglishString = unicode(self.request.headers.get('customHeader') 

However, the problem is when I try to send non-English string like Korean, it's saved in HTTP header like this:

Customheader = "\Uc8fc\Uba39\Uc774 \Uc6b4\Ub2e4"; 

and when it's received by Google App Engine and saved in DataStore, it's changed to be like:

??? ?? 

as if it can't find the proper characters for the unicode value.

Is it not POSSIBLE or ALLOWED to send non-English string using HTTP Header?

If my iOS uses just setHTTPBody, it can transfer non-English strings and save to App Engine's DataStore properly.

[httpRequest setHTTPBody:[httpBody dataUsingEncoding:NSUTF8StringEncoding]]; 

But I just can't find the right way to achieve same goal using HTTP Headers, like what many APIs like Foursquare's do and saving the strings in the proper forms in Python based Google App Engine's DataStore

like image 212
petershine Avatar asked Mar 24 '11 17:03

petershine


People also ask

Can HTTP headers have non ascii characters?

RFC 2616 is saying that you can ONLY use US-ASCII in HTTP headers. Other characters have to be encoded.

Can HTTP headers have special characters?

The value of the HTTP request header you want to set can only contain: Alphanumeric characters: a - z and A - Z. The following special characters: _ :;.,\/"'?!(){}[]@<>=-+*#$&`|~^%

What encoding do HTTP headers use?

HTTP messages are encoded with ISO-8859-1 (which can be nominally considered as an enhanced ASCII version, containing umlauts, diacritic and other characters of West European languages). At the same time, the message body can use another encoding assigned in "Content-Type" header.

Are HTTP headers strings?

HTTP header fields are a list of strings sent and received by both the client program and server on every HTTP request and response. These headers are usually invisible to the end-user and are only processed or logged by the server and client applications.


2 Answers

Is it not POSSIBLE or ALLOWED to send non-English string using HTTP Header?

It's not possible as per HTTP standards to put non-ISO-8859-1 characters directly in an HTTP header. That gives you ASCII ("English"?) characters plus common Western European diacriticals.

However in practice you can't even use the extended ISO-8859-1 characters, because servers and browsers don't agree on what to do with non-ASCII characters in headers. Safari takes RFC2616 at its word and treats high bytes as ISO-8859-1 characters; Mozilla takes UTF-16 code unit low bytes, which is similar but weirder; Opera and Chrome decode from UTF-8; IE uses the local system code page.

So in reality all you can put in an HTTP header is simple ASCII with no control codes. If you want anything more, you'll have to come up with an encoding scheme (eg UTF-8+base64). The RFC2616 standard suggests RFC2047 encoded-words as a standard form of encoding, but this makes no sense given the definitions of when they are allowable in RFC2047 itself, and nothing supports it.

like image 137
bobince Avatar answered Oct 22 '22 12:10

bobince


It is possible to use character sets other than ISO 8859-1 in HTTP headers, but they must be encoded as described in RFC 2047.

like image 33
Ignacio Vazquez-Abrams Avatar answered Oct 22 '22 12:10

Ignacio Vazquez-Abrams