After studying HTTP/1.1 standard, specifically page 31 and related I came to conclusion that any 8-bit octet can be present in HTTP header value. I.e. any character with code from [0,255] range.
And yet HTTP servers I tried refuse to take anything with code > 127 (or most US-ASCII non-printable chars).
Here is dried out excerpt of grammar used in standard:
message-header = field-name ":" [ field-value ] field-name = token field-value = *( field-content | LWS ) field-content = <the OCTETs making up the field-value and consisting of either *TEXT or combinations of token, separators, and quoted-string> CR = <US-ASCII CR, carriage return (13)> LF = <US-ASCII LF, linefeed (10)> SP = <US-ASCII SP, space (32)> HT = <US-ASCII HT, horizontal-tab (9)> CRLF = CR LF LWS = [CRLF] 1*( SP | HT ) OCTET = <any 8-bit sequence of data> CHAR = <any US-ASCII character (octets 0 - 127)> CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)> TEXT = <any OCTET except CTLs, but including LWS> token = 1*<any CHAR except CTLs or separators> separators = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | "\" | <"> | "/" | "[" | "]" | "?" | "=" | "{" | "}" | SP | HT quoted-string = ( <"> *(qdtext | quoted-pair ) <"> ) qdtext = <any TEXT except <">> quoted-pair = "\" CHAR
As you can see field-content
can be a quoted-string
, which is an enquoted sequence of TEXT
(i.e. any 8-bit octet with exception of "
and values from [0-8, 11-12, 14-31, 127]
range) or quoted-pair
(\
followed by any value from [0, 127]
range). I.e. any 8-bit char sequence can be passed by en-quoting it and prefixing special symbols with \
).
(Note that standard doesn't treat NUL(0x00)
char in any special way)
But, obviously either all servers I tried are not conforming or standard has changed since 1999 or I can't read it properly.
So... which characters are allowed in HTTP header values and why?
P.S. Reason behind all of this: I am looking for a way to pass utf-8-encoded sequence in HTTP header value (without additional encoding, if possible).
HTTP headers let the client and the server pass additional information with an HTTP request or response. An HTTP header consists of its case-insensitive name followed by a colon ( : ), then by its value. Whitespace before the value is ignored.
field-name cannot have spaces.
RFC 2616 is saying that you can ONLY use US-ASCII in HTTP headers. Other characters have to be encoded.
Common Response HeadersThe first line of the response is mandatory and consists of the protocol ( HTTP/1.1),response code (200)and description (OK). The headers shown are: CONTENT-Type -This is Text/html which is a web page. It also includes the character set which is UTF-8.
RFC 2616 is obsolete, the relevant part has been replaced by RFC 7230.
The NUL octet is no longer allowed in comment and quoted-string text, and handling of backslash-escaping in them has been clarified. The quoted-pair rule no longer allows escaping control characters other than HTAB. Non-US-ASCII content in header fields and the reason phrase has been obsoleted and made opaque (the TEXT rule was removed). (Section 3.2.6)
In essence, RFC 2616 defaulted to ISO-8859-1, and this was both insufficient and not interoperable anyway. Thus, RFC 7230 has deprecated non-ASCII octets in field values. The recommendation is to use an escaping mechanism on top of that (such as defined in RFC 8187, or plain URI-percent-encoding).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With