Some time ago I asked how to do Incremental updates using browser cache. Here I'm giving a short summary of the problem - for more context, especially the reason why I want to do this, please refer to the old question. I'd like you to review and improve my solution idea (just an idea, so don't send me to code review :D). <h3>The problem</h3> The client (a single page app) gets rather big lists from the server. This works fine and actually saves server resources as <ul> <li>the same list can be served to multiple clients</li> <li>and the clients do the filtering and sorting without bothering the server again and again.</li> </ul> Some of these lists are user-specific, others are common to a group of users, others are global. All these lists may change anytime and we never want to serve stale data (the <code>Cache-Control</code> and <code>Expires</code> HTTP header are of no direct use here). We're using <code>304 NOT MODIFIED</code>, which helps in case when nothing has changed. When anything changes, the changes are usually small, but HTTP does not support this case at all, so we have to send the whole list including the unchanged parts. We can send the delta instead, but there's no obvious way how this can be efficiently cached by the browswer (caching in <code>localStorage</code> or alike is by far not as good as I explained in my linked question). An important property of our lists is that every item has a unique <code>id</code> and a last modified <code>timestamp</code>. The <code>timestamp</code> allows us to compute the delta easily by finding the items that have changed recently. The <code>id</code> allows us to apply the delta simply by replacing the corresponding items (the list is internally a <code>Map<Id, Item></code>). This wouldn't work for deletions, but let's ignore them for now. <h3>The idea</h3> I'm suggesting to use multiple lists (any number should work) of varying sizes, with bigger list cacheable for a long time. Let's assume, a day is a suitable time unit and let's use the following three lists: <ul> <li><code>WEEK</code> This is the base list containing all items as they existed at some arbitrary time in the current week.</li> <li><code>DAY</code> A list containing all items which have changed this week except today as they existed at some arbitrary time in the current day. Items changed today may or may not be included.</li> <li><code>CURRENT</code> A list containing all items which have changed today as they exist just now.</li> </ul> The client gets all three lists. It starts with <code>WEEK</code>, applies <code>DAY</code> (i.e., inserts new items and replaces old ones) and finally applies <code>CURRENT</code>. <h3>An example</h3> Let's assume there are 1000 items in the list with 10 items changing per day. The <code>WEEK</code> list contains all 1000 items, but it can be cached until the end of the week. Its exact content is not specified and different clients may have different versions of it (as long as the condition from the above bullet holds). This allows the server to cache the data for a whole week, but it also allows it to drop them as serving the current state is fine, too. The <code>DAY</code> list contains up to 70 items and can be cached until the end of a day. The <code>CURRENT</code> list contains up to 10 items and can only be cached until anything changes. <h3>The communication</h3> The client should know nothing about the used time scale, but it needs to know the number of lists to ask for. A "classical" request like <pre class="prettyprint"><code>GET /api/order/123 // get the whole list with up to date content </code></pre> will be replaced by three requests like <pre class="prettyprint"><code>GET /api/0,order/123 // get the WEEK list GET /api/1,order/123 // get the DAY list GET /api/2,order/123 // get the CURRENT list </code></pre> <h3>The questions</h3> Usually the changes are indeed as described, but sometimes all items change at once. When this happens, then all three list contain all items, meaning that we have to serve three times as much data. Fortunately, such events are very rare (e.g., when we add an attribute), but I'd like to see a way allowing us to avoid such bursts? Do you see any other problems with this idea? Is there any solution for deletions apart from just marking the items as deleted and postponing the physical deletion until the caches expire (i.e., until the end of week in my example). Any improvements?

Yes I see big problems with this. That it is a big list implies that the client has a lot of work to do to pull down the resources it needs. That has a big impact on performance. <blockquote> All these lists may change anytime and we never want to serve stale data </blockquote> So you should be using long cache times and cache-busting urls. <blockquote> We're using 304 NOT MODIFIED </blockquote> That's about the worst possible way to address the problem. Most of the cost of retrieval is in latency. If you are replying with a 304 response then you've already had most of the costs - this will be particularly pronounced when you are dealing with small pieces of data. HTTP/2 helps (compared with 1.0 and 1.1) but doesn't eliminate the cost. I would also question a lot of the assumptions made in your original question.

A solution idea for incremental updates using browser cache

Tags:

Some time ago I asked how to do Incremental updates using browser cache. Here I'm giving a short summary of the problem - for more context, especially the reason why I want to do this, please refer to the old question. I'd like you to review and improve my solution idea (just an idea, so don't send me to code review :D).

The problem

The client (a single page app) gets rather big lists from the server. This works fine and actually saves server resources as

the same list can be served to multiple clients
and the clients do the filtering and sorting without bothering the server again and again.

Some of these lists are user-specific, others are common to a group of users, others are global. All these lists may change anytime and we never want to serve stale data (the Cache-Control and Expires HTTP header are of no direct use here).

We're using 304 NOT MODIFIED, which helps in case when nothing has changed. When anything changes, the changes are usually small, but HTTP does not support this case at all, so we have to send the whole list including the unchanged parts. We can send the delta instead, but there's no obvious way how this can be efficiently cached by the browswer (caching in localStorage or alike is by far not as good as I explained in my linked question).

An important property of our lists is that every item has a unique id and a last modified timestamp. The timestamp allows us to compute the delta easily by finding the items that have changed recently. The id allows us to apply the delta simply by replacing the corresponding items (the list is internally a Map<Id, Item>). This wouldn't work for deletions, but let's ignore them for now.

The idea

I'm suggesting to use multiple lists (any number should work) of varying sizes, with bigger list cacheable for a long time. Let's assume, a day is a suitable time unit and let's use the following three lists:

WEEK This is the base list containing all items as they existed at some arbitrary time in the current week.
DAY A list containing all items which have changed this week except today as they existed at some arbitrary time in the current day. Items changed today may or may not be included.
CURRENT A list containing all items which have changed today as they exist just now.

The client gets all three lists. It starts with WEEK, applies DAY (i.e., inserts new items and replaces old ones) and finally applies CURRENT.

An example

Let's assume there are 1000 items in the list with 10 items changing per day.

The WEEK list contains all 1000 items, but it can be cached until the end of the week. Its exact content is not specified and different clients may have different versions of it (as long as the condition from the above bullet holds). This allows the server to cache the data for a whole week, but it also allows it to drop them as serving the current state is fine, too.

The DAY list contains up to 70 items and can be cached until the end of a day.

The CURRENT list contains up to 10 items and can only be cached until anything changes.

The communication

The client should know nothing about the used time scale, but it needs to know the number of lists to ask for. A "classical" request like

GET /api/order/123      // get the whole list with up to date content

will be replaced by three requests like

GET /api/0,order/123    // get the WEEK list
GET /api/1,order/123    // get the DAY list
GET /api/2,order/123    // get the CURRENT list

The questions

Usually the changes are indeed as described, but sometimes all items change at once. When this happens, then all three list contain all items, meaning that we have to serve three times as much data. Fortunately, such events are very rare (e.g., when we add an attribute), but I'd like to see a way allowing us to avoid such bursts?

Do you see any other problems with this idea?

Is there any solution for deletions apart from just marking the items as deleted and postponing the physical deletion until the caches expire (i.e., until the end of week in my example).

Any improvements?

328

asked May 04 '18 00:05

maaartinus

1 Answers

Yes I see big problems with this. That it is a big list implies that the client has a lot of work to do to pull down the resources it needs. That has a big impact on performance.

All these lists may change anytime and we never want to serve stale data

So you should be using long cache times and cache-busting urls.

We're using 304 NOT MODIFIED

That's about the worst possible way to address the problem. Most of the cost of retrieval is in latency. If you are replying with a 304 response then you've already had most of the costs - this will be particularly pronounced when you are dealing with small pieces of data. HTTP/2 helps (compared with 1.0 and 1.1) but doesn't eliminate the cost.

I would also question a lot of the assumptions made in your original question.

199

answered Sep 28 '22 06:09

symcbean

Related questions
                            
                                Customizing the material theme in ag-Grid doesn't use the accent-color for the checkbox
                            
                                Field error in object 'user' on field 'userProfiles': rejected value [3];
                            
                                Does JS's spread syntax appear in other languages?
                            
                                Does reading or writing a whole 32-bit word, even though we only have a reference to a part of it, result in undefined behaviour?
                            
                                Serilog Destructure.ByTransforming<Interface>() not working
                            
                                Heroku commands throwing this error on mac: !error getting commands pid 29989 SIGSEGV (signal 11)
                            
                                Detect page change with javascript?
                            
                                Returning unique_ptr<Object> as unique_ptr<const Object>
                            
                                Shapely unable to split line on point due to precision issues
                            
                                Is it possible to download any file (pdf or zip) using volley on android?
                            
                                Container::getAlias($abstract) throws ErrorException: Illegal offset type in isset or empty when $abstract there is not in $this->aliases[]
                            
                                sqlite3: command not found Python 3 on Windows 10

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With