Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A solution idea for incremental updates using browser cache

Tags:

Some time ago I asked how to do Incremental updates using browser cache. Here I'm giving a short summary of the problem - for more context, especially the reason why I want to do this, please refer to the old question. I'd like you to review and improve my solution idea (just an idea, so don't send me to code review :D).

The problem

The client (a single page app) gets rather big lists from the server. This works fine and actually saves server resources as

  • the same list can be served to multiple clients
  • and the clients do the filtering and sorting without bothering the server again and again.

Some of these lists are user-specific, others are common to a group of users, others are global. All these lists may change anytime and we never want to serve stale data (the Cache-Control and Expires HTTP header are of no direct use here).

We're using 304 NOT MODIFIED, which helps in case when nothing has changed. When anything changes, the changes are usually small, but HTTP does not support this case at all, so we have to send the whole list including the unchanged parts. We can send the delta instead, but there's no obvious way how this can be efficiently cached by the browswer (caching in localStorage or alike is by far not as good as I explained in my linked question).

An important property of our lists is that every item has a unique id and a last modified timestamp. The timestamp allows us to compute the delta easily by finding the items that have changed recently. The id allows us to apply the delta simply by replacing the corresponding items (the list is internally a Map<Id, Item>). This wouldn't work for deletions, but let's ignore them for now.

The idea

I'm suggesting to use multiple lists (any number should work) of varying sizes, with bigger list cacheable for a long time. Let's assume, a day is a suitable time unit and let's use the following three lists:

  • WEEK This is the base list containing all items as they existed at some arbitrary time in the current week.

  • DAY A list containing all items which have changed this week except today as they existed at some arbitrary time in the current day. Items changed today may or may not be included.

  • CURRENT A list containing all items which have changed today as they exist just now.

The client gets all three lists. It starts with WEEK, applies DAY (i.e., inserts new items and replaces old ones) and finally applies CURRENT.

An example

Let's assume there are 1000 items in the list with 10 items changing per day.

The WEEK list contains all 1000 items, but it can be cached until the end of the week. Its exact content is not specified and different clients may have different versions of it (as long as the condition from the above bullet holds). This allows the server to cache the data for a whole week, but it also allows it to drop them as serving the current state is fine, too.

The DAY list contains up to 70 items and can be cached until the end of a day.

The CURRENT list contains up to 10 items and can only be cached until anything changes.

The communication

The client should know nothing about the used time scale, but it needs to know the number of lists to ask for. A "classical" request like

GET /api/order/123      // get the whole list with up to date content

will be replaced by three requests like

GET /api/0,order/123    // get the WEEK list
GET /api/1,order/123    // get the DAY list
GET /api/2,order/123    // get the CURRENT list

The questions

Usually the changes are indeed as described, but sometimes all items change at once. When this happens, then all three list contain all items, meaning that we have to serve three times as much data. Fortunately, such events are very rare (e.g., when we add an attribute), but I'd like to see a way allowing us to avoid such bursts?

Do you see any other problems with this idea?

Is there any solution for deletions apart from just marking the items as deleted and postponing the physical deletion until the caches expire (i.e., until the end of week in my example).

Any improvements?

like image 328
maaartinus Avatar asked May 04 '18 00:05

maaartinus


People also ask

How do I force my browser to reload cached?

Windows. Press Ctrl+F5. In most browsers, pressing Ctrl+F5 will force the browser to retrieve the webpage from the server instead of loading it from the cache. Firefox, Chrome, Opera, and Internet Explorer all send a “Cache-Control: no-cache” command to the server.

Which data should be cached?

In-memory data lookup: If you have a mobile / web app front end you might want to cache some information like user profile, some historical / static data, or some api response according to your use cases. Caching will help in storing such data.

What is caching and how it works?

In computing, a cache is a high-speed data storage layer which stores a subset of data, typically transient in nature, so that future requests for that data are served up faster than is possible by accessing the data's primary storage location.


1 Answers

Yes I see big problems with this. That it is a big list implies that the client has a lot of work to do to pull down the resources it needs. That has a big impact on performance.

All these lists may change anytime and we never want to serve stale data

So you should be using long cache times and cache-busting urls.

We're using 304 NOT MODIFIED

That's about the worst possible way to address the problem. Most of the cost of retrieval is in latency. If you are replying with a 304 response then you've already had most of the costs - this will be particularly pronounced when you are dealing with small pieces of data. HTTP/2 helps (compared with 1.0 and 1.1) but doesn't eliminate the cost.

I would also question a lot of the assumptions made in your original question.

like image 199
symcbean Avatar answered Sep 28 '22 06:09

symcbean