Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I deal with different requests that map to the same response?

Tags:

http

caching

I'm designing a Web service. The request is idempotent, so I chose the GET method. The response is relatively expensive to calculate and not small, so I want to get caching (on the protocol level) right. (Don't worry about memoisation at my part, I have that already covered; my question here is actually also paying attention to the Web as a whole.)

There's only one mandatory parameter and a number of optional parameter with default values if missing. For example, the following two map to the same representation of the response. (If this is a dumb way to go about it the interface, propose something better.)

GET /service?mandatory_parameter=some_data HTTP/1.1
GET /service?mandatory_parameter=some_data;optional_parameter=default1;another_optional_parameter=default2;yet_another_optional_parameter=default3 HTTP/1.1

However, I imagine clients do not know this and would treat them separate and therefore waste cache storage. What should I do to avoid violating the golden rule of caching?

  1. Make up a canonical form, document it (e.g. all parameters are required after all and need to be sorted in a specific order) and return a client error unless the required form is met?
  2. Instead of an error, redirect permanently to the canonical form of a request?
  3. Or is it enough to not mind how the request looks like, and just respond with the same ETag for same responses?
like image 326
daxim Avatar asked Apr 30 '10 13:04

daxim


2 Answers

First, don't use semicolons as a delimiter in a query string. You should be using ? to begin a query string and & to delimit variable/value pairs. RFC 3986 doesn't explicitly say you have to use &, but the vast majority of existing code uses this delimiter because of the application/x-www-form-urlencoded precedent.

Second, you're right, in that parameters in a query string result in a different URI, and thus, as far as caches are concerned, a different resource. Assuming you want optimal caching performance, if you know that an optional parameter has been specified, and its inclusion is unnecessary and does not affect the representation that will be transmitted, you should be making a redirect to a canonical representation that omits the parameter. (i.e., An optional parameter is given with a value that is set to the default value. For example, if you have http://example.com:80/, you can normalize to http://example.com/ because 80 is the default value for the port with HTTP. You can do the same for query parameters since you control the URI space.) If you have parameters included (optional or otherwise) that appear in an order other than the canonical order, you should redirect for that too. A 301 redirect would be preferred if you know that the relationship between URIs will be stable. Otherwise, do a 302/307 redirect as appropriate. I would recommend defining your canonical form the same way that OAuth does: Sort each parameter alphabetically, first by key, then by value. Other normalization operations will also help out here. RFC 3986 has an entire section on URI normalization that will be relevant to you. This technique will really only work for GET, and redirects on PUT/POST/DELETE are not generally recommended.

Third, ETags are great, and they provide a huge performance improvement if implemented well by both the client and server. However, it's unfortunately rare for both sides to do it right. Ditto for Last-Modified. You should pursue these, because the CPU and bandwidth savings are significant when it works, but they are not sufficient on their own. Other headers like Cache-Control are also frequently necessary. It's worth familiarizing yourself with Section 13 of RFC 2616 if you're planning on going into great detail on this stuff.

Finally, a word of warning — there is an issue with these redirects you need to be aware of: Clients trying to access your resources may frequently be redirected to other locations. This introduces overhead that only gives you an overall savings if the clients make subsequent requests against the same resource, maintaining state to avoid the subsequent redirect. Unless you've open-sourced a reference client implementation that takes advantage of your caching optimizations, you may never benefit from these tweaks.

like image 157
Bob Aman Avatar answered Oct 16 '22 23:10

Bob Aman


I would pick option (2) in your list - I would make the request RESTful, rather than RPC like.

I.e. in this case, if you make all of the parameters parts of the request path:

/service/mandatory_parameter/some_data/optional_parameter/default1/another_optional_parameter/default2/yet_another_optional_parameter/default3

In the case where not all of the optional parameters are specified, return a 301 (Permanent redirect) to the full resource name with the defaults filled in. This will (or should) be cached by clients and web caches appropriately, and even if it gets to your backend then making the 301 should be very cheap for you.

At which point, you have one canonical form for the URI, and caching will work as normal/expected.

This does mean that every combination of parameters will be cached separately (as a 301), however that's fine really as the non-canonical requests will have an independent cache policy to the full request and clients which are worried about the extra round trip can fill in all the parameters themselves.

Your option (3) won't work as you expect - each form will be cached independently as they're different URIs.

It should also be noted that a lot of downstream caches / software won't cache your response at all due to the query parameters, which is why I suggest turning it into a 'proper' resource..

like image 35
user115340 Avatar answered Oct 16 '22 23:10

user115340