Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing of HTTP Headers Values: Quoting, RFC 5987, MIME, etc

What confuses me is decoding of HTTP header values.

Example Header:
Some-Header: "quoted string?"; *utf-8'en'Weirdness

Can header value's be quoted? What about the encoding of a " itself? is ' a valid quote character? What's the significance of a semi-colon (;)? Could the value parser for a HTTP header be considered a MIME parser?

I am making a transparent proxy that needs to transparently handle and modify many in-the-wild header fields. That's why I need so much detail on the format.

like image 820
unixman83 Avatar asked Dec 22 '11 20:12

unixman83


1 Answers

Can header values be quoted?

If you mean does the RFC 5987 parameter production apply to the main part of the header value, then no.

Some-Header: "foo"; bar*=utf-8'en'bof

Here the main part of the header value would probably be "foo" including the quotes, but...

What's the significance of a semi-colon (;)?

The specific handling is defined for each named header separately. So semicolon is significant for, say, Content-Disposition, but not for Content-Length.

Obviously this is not a very satisfactory solution but that's what we're stuck with.

I am making a transparent proxy that needs to transparently handle and modify many in-the-wild header fields.

You can't handle these in a generic way, you have to know the form of each possible header. For anything you don't recognise, don't attempt to decompose the header value; and really, so little out there supports RFC 5987 at the moment, it's unlikely you'll be able to do much useful handling of it.

Status quo today is that non-ASCII characters in header values doesn't work well enough cross-browser to be used at all, either encoded or raw.

Luckily they are rarely needed. The only really common use case is non-ASCII filenames for Content-Disposition but that's easier to work around by putting the filename in a trailing URL path part instead.

Could the value parser for a HTTP header be considered a MIME parser?

No. HTTP borrows heavily from MIME and the RFC 822 family of standards in general, but it isn't part of the 822 family. It has its own low-level grammar for headers which looks like 822, but isn't quite compatible. Arbitrary MIME features can't be used in HTTP, there has to be a standardisation mechanism to drag them into HTTP explicitly—which is what RFC 5987 is, for (parts of) RFC 2231.

(See section 19.4 of RFC 2616 for discussion of some other differences.)

In theory, a multipart form submission is part of the 822 family and you should be able to use RFC 2231 encoding there. But the reality is browsers don't support that either.

like image 88
bobince Avatar answered Sep 20 '22 00:09

bobince