Some time ago I asked how to do Incremental updates using browser cache. Here I'm giving a short summary of the problem - for more context, especially the reason why I want to do this, please refer to the old question. I'd like you to review and improve my solution idea (just an idea, so don't send me to code review :D).
The client (a single page app) gets rather big lists from the server. This works fine and actually saves server resources as
Some of these lists are user-specific, others are common to a group of users, others are global.
All these lists may change anytime and we never want to serve stale data (the Cache-Control
and Expires
HTTP header are of no direct use here).
We're using 304 NOT MODIFIED
, which helps in case when nothing has changed.
When anything changes, the changes are usually small, but HTTP does not support this case at all, so we have to send the whole list including the unchanged parts.
We can send the delta instead, but there's no obvious way how this can be efficiently cached by the browswer (caching in localStorage
or alike is by far not as good as I explained in my linked question).
An important property of our lists is that every item has a unique id
and a last modified timestamp
.
The timestamp
allows us to compute the delta easily by finding the items that have changed recently.
The id
allows us to apply the delta simply by replacing the corresponding items (the list is internally a Map<Id, Item>
).
This wouldn't work for deletions, but let's ignore them for now.
I'm suggesting to use multiple lists (any number should work) of varying sizes, with bigger list cacheable for a long time. Let's assume, a day is a suitable time unit and let's use the following three lists:
WEEK
This is the base list containing all items as they existed at some arbitrary time in the current week.
DAY
A list containing all items which have changed this week except today as they existed at some arbitrary time in the current day.
Items changed today may or may not be included.
CURRENT
A list containing all items which have changed today as they exist just now.
The client gets all three lists. It starts with WEEK
, applies DAY
(i.e., inserts new items and replaces old ones) and finally applies CURRENT
.
Let's assume there are 1000 items in the list with 10 items changing per day.
The WEEK
list contains all 1000 items, but it can be cached until the end of the week.
Its exact content is not specified and different clients may have different versions of it (as long as the condition from the above bullet holds).
This allows the server to cache the data for a whole week, but it also allows it to drop them as serving the current state is fine, too.
The DAY
list contains up to 70 items and can be cached until the end of a day.
The CURRENT
list contains up to 10 items and can only be cached until anything changes.
The client should know nothing about the used time scale, but it needs to know the number of lists to ask for. A "classical" request like
GET /api/order/123 // get the whole list with up to date content
will be replaced by three requests like
GET /api/0,order/123 // get the WEEK list
GET /api/1,order/123 // get the DAY list
GET /api/2,order/123 // get the CURRENT list
Usually the changes are indeed as described, but sometimes all items change at once. When this happens, then all three list contain all items, meaning that we have to serve three times as much data. Fortunately, such events are very rare (e.g., when we add an attribute), but I'd like to see a way allowing us to avoid such bursts?
Do you see any other problems with this idea?
Is there any solution for deletions apart from just marking the items as deleted and postponing the physical deletion until the caches expire (i.e., until the end of week in my example).
Any improvements?
Windows. Press Ctrl+F5. In most browsers, pressing Ctrl+F5 will force the browser to retrieve the webpage from the server instead of loading it from the cache. Firefox, Chrome, Opera, and Internet Explorer all send a “Cache-Control: no-cache” command to the server.
In-memory data lookup: If you have a mobile / web app front end you might want to cache some information like user profile, some historical / static data, or some api response according to your use cases. Caching will help in storing such data.
In computing, a cache is a high-speed data storage layer which stores a subset of data, typically transient in nature, so that future requests for that data are served up faster than is possible by accessing the data's primary storage location.
Yes I see big problems with this. That it is a big list implies that the client has a lot of work to do to pull down the resources it needs. That has a big impact on performance.
All these lists may change anytime and we never want to serve stale data
So you should be using long cache times and cache-busting urls.
We're using 304 NOT MODIFIED
That's about the worst possible way to address the problem. Most of the cost of retrieval is in latency. If you are replying with a 304 response then you've already had most of the costs - this will be particularly pronounced when you are dealing with small pieces of data. HTTP/2 helps (compared with 1.0 and 1.1) but doesn't eliminate the cost.
I would also question a lot of the assumptions made in your original question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With