Smart progress bar ETA computation

Tags:

In many applications, we have some progress bar for a file download, for a compression task, for a search, etc. We all often use progress bars to let users know something is happening. And if we know some details like just how much work has been done and how much is left to do, we can even give a time estimate, often by extrapolating from how much time it's taken to get to the current progress level.

compression ETA screenshot
_{(source: jameslao.com)}

But we've also seen programs which this Time Left "ETA" display is just comically bad. It claims a file copy will be done in 20 seconds, then one second later it says it's going to take 4 days, then it flickers again to be 20 minutes. It's not only unhelpful, it's confusing! The reason the ETA varies so much is that the progress rate itself can vary and the programmer's math can be overly sensitive.

Apple sidesteps this by just avoiding any accurate prediction and just giving vague estimates! Apple's vague evasion
_{(source: autodesk.com)}

That's annoying too, do I have time for a quick break, or is my task going to be done in 2 more seconds? If the prediction is too fuzzy, it's pointless to make any prediction at all.

Easy but wrong methods

As a first pass ETA computation, probably we all just make a function like if p is the fractional percentage that's done already, and t is the time it's taken so far, we output t*(1-p)/p as the estimate of how long it's going to take to finish. This simple ratio works "OK" but it's also terrible especially at the end of computation. If your slow download speed keeps a copy slowly advancing happening overnight, and finally in the morning, something kicks in and the copy starts going at full speed at 100X faster, your ETA at 90% done may say "1 hour", and 10 seconds later you're at 95% and the ETA will say "30 minutes" which is clearly an embarassingly poor guess.. in this case "10 seconds" is a much, much, much better estimate.

When this happens you may think to change the computation to use recent speed, not average speed, to estimate ETA. You take the average download rate or completion rate over the last 10 seconds, and use that rate to project how long completion will be. That performs quite well in the previous overnight-download-which-sped-up-at-the-end example, since it will give very good final completion estimates at the end. But this still has big problems.. it causes your ETA to bounce wildly when your rate varies quickly over a short period of time, and you get the "done in 20 seconds, done in 2 hours, done in 2 seconds, done in 30 minutes" rapid display of programming shame.

The actual question:

What is the best way to compute an estimated time of completion of a task, given the time history of the computation? I am not looking for links to GUI toolkits or Qt libraries. I'm asking about the algorithm to generate the most sane and accurate completion time estimates.

Have you had success with math formulas? Some kind of averaging, maybe by using the mean of the rate over 10 seconds with the rate over 1 minute with the rate over 1 hour? Some kind of artificial filtering like "if my new estimate varies too much from the previous estimate, tone it down, don't let it bounce too much"? Some kind of fancy history analysis where you integrate progress versus time advancement to find standard deviation of rate to give statistical error metrics on completion?

What have you tried, and what works best?

991

asked Jun 01 '09 01:06

SPWorley

1 Answers

Original Answer

The company that created this site apparently makes a scheduling system that answers this question in the context of employees writing code. The way it works is with Monte Carlo simulation of future based on the past.

Appendix: Explanation of Monte Carlo

This is how this algorithm would work in your situation:

You model your task as a sequence of microtasks, say 1000 of them. Suppose an hour later you completed 100 of them. Now you run the simulation for the remaining 900 steps by randomly selecting 90 completed microtasks, adding their times and multiplying by 10. Here you have an estimate; repeat N times and you have N estimates for the time remaining. Note the average between these estimates will be about 9 hours -- no surprises here. But by presenting the resulting distribution to the user you'll honestly communicate to him the odds, e.g. 'with the probability 90% this will take another 3-15 hours'

This algorithm, by definition, produces complete result if the task in question can be modeled as a bunch of independent, random microtasks. You can gain a better answer only if you know how the task deviates from this model: for example, installers typically have a download/unpacking/installing tasklist and the speed for one cannot predict the other.

Appendix: Simplifying Monte Carlo

I'm not a statistics guru, but I think if you look closer into the simulation in this method, it will always return a normal distribution as a sum of large number of independent random variables. Therefore, you don't need to perform it at all. In fact, you don't even need to store all the completed times, since you'll only need their sum and sum of their squares.

In maybe not very standard notation,

Click to copy

sigma = sqrt ( sum_of_times_squared-sum_of_times^2 ) scaling = 900/100          // that is (totalSteps - elapsedSteps) / elapsedSteps lowerBound = sum_of_times*scaling - 3*sigma*sqrt(scaling) upperBound = sum_of_times*scaling + 3*sigma*sqrt(scaling)

With this, you can output the message saying that the thing will end between [lowerBound, upperBound] from now with some fixed probability (should be about 95%, but I probably missed some constant factor).

answered Sep 19 '22 05:09

ilya n.

Related questions
                            
                                How to generate Sudoku boards with unique solutions
                            
                                Select k random elements from a list whose elements have weights
                            
                                C How to "draw" a Binary Tree to the console [closed]
                            
                                Possible Interview Question: How to Find All Overlapping Intervals
                            
                                Which is faster, Hash lookup or Binary search?
                            
                                Test if a number is fibonacci
                            
                                Longest equally-spaced subsequence
                            
                                What's the difference between `git diff --patience` and `git diff --histogram`?
                            
                                Strategies for simplifying math expressions
                            
                                robust algorithm for surface reconstruction from 3D point cloud?
                            
                                Representing logic as data in JSON
                            
                                Difference between O(n) and O(log(n)) - which is better and what exactly is O(log(n))?
                            
                                Maximize the rectangular area under Histogram
                            
                                Most elegant way to change 0 to 1 and vice versa
                            
                                Algorithm for autocomplete?
                            
                                Weighted random selection from array
                            
                                Circular lock-free buffer
                            
                                Python: find closest string (from a list) to another string
                            
                                Writing your own square root function
                            
                                The "guess the number" game for arbitrary rational numbers?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Smart progress bar ETA computation

Tags:

language-agnostic

user-interface

algorithm