How to estimate download time remaining (accurately)?

Tags:

Sure you could divide the remaining file size by the current download speed, but if your download speed fluctuates (and it will), this doesn't produce a very nice result. What's a better algorithm for producing smoother countdowns?

291

asked May 06 '10 08:05

mpen

2 Answers

An exponential moving average is great for this. It provides a way to smooth your average so that each time you add a new sample the older samples become decreasingly important to the overall average. They are still considered, but their importance drops off exponentially--hence the name. And since it's a "moving" average, you only have to keep a single number around.

In the context of measuring download speed the formula would look like this:

averageSpeed = SMOOTHING_FACTOR * lastSpeed + (1-SMOOTHING_FACTOR) * averageSpeed;

SMOOTHING_FACTOR is a number between 0 and 1. The higher this number, the faster older samples are discarded. As you can see in the formula, when SMOOTHING_FACTOR is 1 you are simply using the value of your last observation. When SMOOTHING_FACTOR is 0 averageSpeed never changes. So, you want something in between, and usually a low value to get decent smoothing. I've found that 0.005 provides a pretty good smoothing value for an average download speed.

lastSpeed is the last measured download speed. You can get this value by running a timer every second or so to calculate how many bytes have downloaded since the last time you ran it.

averageSpeed is, obviously, the number that you want to use to calculate your estimated time remaining. Initialize this to the first lastSpeed measurement you get.

122

answered Sep 22 '22 16:09

Ben Dolman

I wrote an algorithm years ago to predict time remaining in a disk imaging and multicasting program that used a moving average with a reset when the current throughput went outside of a predefined range. It would keep things smooth unless something drastic happened, then it would adjust quickly and then return to a moving average again. See example chart here:

enter image description here

The thick blue line in that example chart is the actual throughput over time. Notice the low throughput during the first half of the transfer and then it jumps up dramatically in the second half. The orange line is an overall average. Notice that it never adjusts up far enough to ever give an accurate prediction of how long it will take to finish. The gray line is a moving average (i.e. the average of the last N data points - in this graph N is 5, but in reality, N might need to be larger to smooth enough). It recovers more quickly, but still takes a while to adjust. It will take more time the larger N is. So if your data is pretty noisy, then N will have to be larger and the recovery time will be longer.

The green line is the algorithm I used. It goes along just like a moving average, but when the data moves outside a predefined range (designated by the light thin blue and yellow lines), it resets the moving average and jumps up immediately. The predefined range can also be based on standard deviation so it can adjust to how noisy the data is automatically. I just threw these values into Excel to diagram them for this answer so it's not perfect, but you get the idea.

Data could be contrived to make this algorithm fail to be a good predictor of time remaining though. The bottom line is that you need to have a general idea of how you expect the data to behave and pick an algorithm accordingly. My algorithm worked well for the data sets I was seeing, so we kept using it.

One other important tip is that usually developers ignore setup and teardown times in their progress bars and time estimate calculations. This results in the eternal 99% or 100% progress bar that just sits there for a long time (while caches are being flushed or other cleanup work is happening) or wild early estimates when the scanning of directories or other setup work happens, accruing time but not accruing any percentage progress, which throws everything off. You can run several tests that include the setup and teardown times and come up with an estimate of how long those times are on average or based on the size of the job and add that time to the progress bar. For example, the first 5% of work is setup work and the last 10% is teardown work and then the 85% in the middle is the download or whatever repeating process your tracking is. This can help a lot too.

answered Sep 22 '22 16:09

rcronk

Related questions
                            
                                Calculate median in c#
                            
                                Algorithm to calculate number of intersecting discs
                            
                                Find the row representing the smallest integer in row wise sorted matrix
                            
                                Training a Neural Network with Reinforcement learning
                            
                                The best way to calculate the height in a binary search tree? (balancing an AVL-tree)
                            
                                Create a hashcode of two numbers
                            
                                Perceptron learning algorithm not converging to 0
                            
                                Algorithm: How do I fade from Red to Green via Yellow using RGB values?
                            
                                Generate all unique substrings for given string
                            
                                How to cartoon-ify an image programmatically?
                            
                                Efficient swapping of elements of an array in Java
                            
                                Algorithm to determine how positive or negative a statement/text is
                            
                                What's a good algorithm to generate a maze? [closed]
                            
                                Is there any way to detect strings like putjbtghguhjjjanika?
                            
                                Hash table - why is it faster than arrays?
                            
                                Quickest way to find missing number in an array of numbers
                            
                                Fast String Hashing Algorithm with low collision rates with 32 bit integer [closed]
                            
                                Is a list (potentially) divisible by another?
                            
                                SVM - hard or soft margins?
                            
                                Tersest way to create an array of integers from 1..20 in JavaScript

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to estimate download time remaining (accurately)?

Tags:

algorithm

download

estimation

mpen

People also ask

2 Answers

Ben Dolman

rcronk

Recent Activity

Donate For Us