Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is This Idea for Loading Online Content in Bulk Feasible?

I devised an idea a long time ago and never got around to implementing it, and I would like to know whether it is practical in that it would work to significantly decrease loading times for modern browsers. It relies on the fact that related tasks are often done more quickly when they are done together in bulk, and that the browser could be downloading content on different pages using a statistical model instead of being idle while the user is browsing. I've pasted below an excerpt from what I originally wrote, which describes the idea.


Description.

When people visit websites, I conjecture that that a probability density function P(q, t), where q is a real-valued integer representing the ID of a website and t is another real-valued, non-negative integer representing the time of the day, can predict the sequence of webpages visited by the typical human accurately enough to warrant requesting and loading the HTML documents the user is going to read in advance. For a given website, have the document which appears to be the "main page" of the website through which users access the other sections be represented by the root of a tree structure. The probability that the user will visit the root node of the tree can be represented in two ways. If the user wishes to allow a process to automatically execute upon the initialization of the operating system to pre-fetch webpages from websites (using a process elaborated later) which the user frequently accesses upon opening the web browser, the probability function which determines whether a given website will have its webpages pre-fetched can be determined using a self-adapting heuristic model based on the user's history (or by manual input). Otherwise, if no such process is desired by the user, the value of P for the root node is irrelevant, since the pre-fetching process is only used after the user visits the main page of the website.

Children in the tree described earlier are each associated with an individual probability function P(q, t) (this function can be a lookup table which stores time-webpage pairs). Thus, the sequences of webpages the user visits over time are logged using this tree structure. For instance, at 7:00 AM, there may be a 71/80 chance that I visit the "WTF" section on Reddit after loading the main page of that site. Based on the values of the p> robability function P for each node in the tree, chains of webpages extending a certain depth from the root node where the net probability that each sequence is followed, P_c, is past a certain threshold, P_min, are requested upon the user visiting the main page of the site. If the downloading of one webpage finishes before before another is processed, a thread pool is used so that another core is assigned the task of parsing the next webpage in the queue of webpages to be parsed. Hopefully, in this manner, a large portion of those webpages the user clicks may be displayed much more quickly than they would be otherwise.


I left out many details and optimizations since I just wanted this to be a brief description of what I was thinking about. Thank you very much for taking the time to read this post; feel free to ask any further questions if you have them.

like image 256
void-pointer Avatar asked Dec 27 '22 21:12

void-pointer


2 Answers

Interesting idea -- and there have been some implementations for pre-fetching in browsers though without the brains you propose -- which could help alot. I think there are some flaws with this plan:

a) web page browsing, in most cases, is fast enough for most purposes.
b) bandwidth is becoming metered -- if I'm just glancing at the home page, do I as a user want to pay to serve the other pages. Moreover, in the cases where this sort of thing could be useful (eg--slow 3g connection), bandwidth tends to be more tightly metered. And perhaps not so good at concurrency (eg -- CDMA 3g connections).
c) from a server operator's point of view, I'd rather just serve requested pages in most cases. Rendering pages that don't ever get seen costs me cycles and bandwidth. If you are like alot of folks and on some cloud computing platform, you are paying by the cycle and the byte.
d) would require re-building lots of analytics systems, many of which still operate on the theory of request == impression

Or, the short summary is that there really isn't a need to pre-sage what people would view in order to speed serving and rendering pages. Now, places where something like this could be really useful would be in the "hey, if you liked X you probably liked Y" and then popping links and such to said content (or products) to folks.

like image 84
Wyatt Barnett Avatar answered Jun 02 '23 14:06

Wyatt Barnett


Windows does the same thing with disk access - it "knows" that you are likely to start let's say Firefox at a certain time and preloads it.

SuperFetch also keeps track of what times of day those applications are used, which allows it to intelligently pre-load information that is expected to be used in the near future.

http://en.wikipedia.org/wiki/Windows_Vista_I/O_technologies#SuperFetch

like image 29
Meh Avatar answered Jun 02 '23 14:06

Meh