Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what characters are allowed in HTTP header values?

After studying HTTP/1.1 standard, specifically page 31 and related I came to conclusion that any 8-bit octet can be present in HTTP header value. I.e. any character with code from [0,255] range.

And yet HTTP servers I tried refuse to take anything with code > 127 (or most US-ASCII non-printable chars).

Here is dried out excerpt of grammar used in standard:

message-header = field-name ":" [ field-value ] field-name     = token field-value    = *( field-content | LWS ) field-content  = <the OCTETs making up the field-value and consisting of                   either *TEXT or combinations of token, separators, and                   quoted-string>  CR             = <US-ASCII CR, carriage return (13)> LF             = <US-ASCII LF, linefeed (10)> SP             = <US-ASCII SP, space (32)> HT             = <US-ASCII HT, horizontal-tab (9)> CRLF           = CR LF LWS            = [CRLF] 1*( SP | HT ) OCTET          = <any 8-bit sequence of data> CHAR           = <any US-ASCII character (octets 0 - 127)> CTL            = <any US-ASCII control character (octets 0 - 31) and DEL (127)> TEXT           = <any OCTET except CTLs, but including LWS>  token          = 1*<any CHAR except CTLs or separators> separators     = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | "\"                | <"> | "/" | "[" | "]" | "?" | "=" | "{" | "}" | SP | HT  quoted-string  = ( <"> *(qdtext | quoted-pair ) <"> ) qdtext         = <any TEXT except <">> quoted-pair    = "\" CHAR 

As you can see field-content can be a quoted-string, which is an enquoted sequence of TEXT (i.e. any 8-bit octet with exception of " and values from [0-8, 11-12, 14-31, 127] range) or quoted-pair (\ followed by any value from [0, 127] range). I.e. any 8-bit char sequence can be passed by en-quoting it and prefixing special symbols with \).

(Note that standard doesn't treat NUL(0x00) char in any special way)

But, obviously either all servers I tried are not conforming or standard has changed since 1999 or I can't read it properly.

So... which characters are allowed in HTTP header values and why?

P.S. Reason behind all of this: I am looking for a way to pass utf-8-encoded sequence in HTTP header value (without additional encoding, if possible).

like image 462
C.M. Avatar asked Dec 07 '17 03:12

C.M.


People also ask

What are HTTP header values?

HTTP headers let the client and the server pass additional information with an HTTP request or response. An HTTP header consists of its case-insensitive name followed by a colon ( : ), then by its value. Whitespace before the value is ignored.

Can HTTP header names contain spaces?

field-name cannot have spaces.

Can HTTP headers have non ascii characters?

RFC 2616 is saying that you can ONLY use US-ASCII in HTTP headers. Other characters have to be encoded.

What is required in HTTP header?

Common Response HeadersThe first line of the response is mandatory and consists of the protocol ( HTTP/1.1),response code (200)and description (OK). The headers shown are: CONTENT-Type -This is Text/html which is a web page. It also includes the character set which is UTF-8.


1 Answers

RFC 2616 is obsolete, the relevant part has been replaced by RFC 7230.

The NUL octet is no longer allowed in comment and quoted-string text, and handling of backslash-escaping in them has been clarified. The quoted-pair rule no longer allows escaping control characters other than HTAB. Non-US-ASCII content in header fields and the reason phrase has been obsoleted and made opaque (the TEXT rule was removed). (Section 3.2.6)

In essence, RFC 2616 defaulted to ISO-8859-1, and this was both insufficient and not interoperable anyway. Thus, RFC 7230 has deprecated non-ASCII octets in field values. The recommendation is to use an escaping mechanism on top of that (such as defined in RFC 8187, or plain URI-percent-encoding).

like image 118
Julian Reschke Avatar answered Sep 25 '22 11:09

Julian Reschke