Norms, rules or guidelines for calculating and showing "ETA/ETC" for a process

Question

ETC = "Estimated Time of Completion"

I'm counting the time it takes to run through a loop and showing the user some numbers that tells him/her how much time, approximately, the full process will take. I feel like this is a common thing that everyone does on occasion and I would like to know if you have any guidelines that you follow.

Here's an example I'm using at the moment:

int itemsLeft; //This holds the number of items to run through.
double timeLeft;
TimeSpan TsTimeLeft;
list<double> avrage;
double milliseconds; //This holds the time each loop takes to complete, reset every loop.

//The background worker calls this event once for each item. The total number 
//of items are in the hundreds for this particular application and every loop takes
//roughly one second.
private void backgroundWorker1_ProgressChanged(object sender, ProgressChangedEventArgs e)
{
    //An item has been completed!

    itemsLeft--;
    avrage.Add(milliseconds);

    //Get an avgrage time per item and multiply it with items left.
    timeLeft = avrage.Sum() / avrage.Count * itemsLeft;
    TsTimeLeft = TimeSpan.FromSeconds(timeLeft);

    this.Text = String.Format("ETC: {0}:{1:D2}:{2:D2} ({3:N2}s/file)", 
        TsTimeLeft.Hours, 
        TsTimeLeft.Minutes, 
        TsTimeLeft.Seconds, 
        avrage.Sum() / avrage.Count);

    //Only using the last 20-30 logs in the calculation to prevent an unnecessarily long List<>.
    if (avrage.Count > 30) 
        avrage.RemoveRange(0, 10);

    milliseconds = 0;
}

//this.profiler.Interval = 10;
private void profiler_Tick(object sender, EventArgs e)
{
    milliseconds += 0.01;
}

As I am a programmer at the very start of my career I'm curious to see what you would do in this situation. My main concern is the fact that I calculate and update the UI for every loop, is this bad practice?

Are there any do's/don't's when it comes to estimations like this? Are there any preferred ways of doing it, e.g. update every second, update every ten logs, calculate and update UI separately? Also when would an ETA/ETC be a good/bad idea.

usr-local-ΕΨΗΕΛΩΝ · Accepted Answer

The real problem with estimation of time taken by a process is the quantification of the workload. Once you can quantify that, you can made a better estimate

Examples of good estimates

File system I/O or network transfer. Whether or not file systems have bad performance, you can get to know in advance, you can quantify the total number of bytes to be processed and you can measure the speed. Once you have these, and once you can monitor how many bytes have you transferred, you get a good estimate. Random factors may affect your estimate (i.e. an application starts meanwhile), but you still get a significative value
Encryption on large streams. For the reasons above. Even if you are computing a MD5 hash, you always know how many blocks have been processed, how many are to be processed and the total.
Item synchronization. This is a little trickier. If you can assume that the per-unit workload is constant or you can make a good estimate of the time required to process an item when variance is low or insignificant, then you can make another good estimate of the process. Pick email synchronization: if you don't know the byte size of the messages (otherwise you fall in case 1) but common practice tells that the majority of emails have quite the same size, then you can use the mean of the time taken to download/upload all processed emails to estimate the time taken to process a single email. This won't work in 100% of the cases and is subject to error, but you still see progress bar progressing on a large account

In general the rule is that you can make a good estimate of ETC/ETA (ETA is actually the date and time the operation is expected to complete) if you have a homogeneous process about of which you know the numbers. Homogeneity grants that the time to process a work item is comparable to others, i.e. the time taken to process a previous item can be used to estimate future. Numbers are used to make correct calculations.

Examples of bad estimates

Operations on a number of files of unknown size. This time you know only how many files you want to process (e.g. to download) but you don't know their size in advance. Once the size of the files has a high variance you see troubles. Having downloaded half of the file, when these were the smallest and sum up to 10% of total bytes, can be said being halfway? No! You just see the progress bar growing fast to 50% and then much slowly
Heterogenous processes. E.g. Windows installations. As pointed out by @HansPassant, Windows installations provide a worse-than-bad estimate. Installing a Windows software involves several processes including: file copy (this can be estimated), registry modifications (usually never estimated), execution of transactional code. The real problem is the last. Transactional processes involving execution of custom installer code are discusses below
Execution of generic code. This can never be estimated. A code fragment involves conditional statements. The execution of these involve changing paths depending on a condition external to the code. This means, for example, that a program behaves differently whether you have a printer installed or not, whether you have a local or a domain account, etc.

Conclusions

Estimating the duration of a software process isn't both an impossible and an exact/*deterministic* task.

It's not impossible because, even in the case of code fragments, you can either find a model for your code (pick a LU factorization as an example, this may be estimated). Or you might redesign your code splitting it into an estimation phase - where you first determine the branch conditions - and an execution phase, where all pre-determined branches are taken. I said might because this task is in practice impossible: most code determines branches as effects of previous conditions, meaning that estimating a branch actually involves running the code. Chicken and egg circle
It's not a deterministic process. Computer systems, especially if multitasking are affected by a number of random factors that may impact on your estimated process. You will never get a correct estimate before running your process. At most, you can detect external factors and re-estimate your process. The fork between your estimate and the real duration of process is mathematically converging to zero when you get closer to process end (lim [x->N] |est(N) - real(N)| == 0, where N is the process duration)

Hans Passant · Answer

If your user interface is so obscure that you have to explain that ETC doesn't mean Etcetera then you are doing it wrong. Every user understands what a progress bar does, don't help.

Nothing is quite as annoying as an inaccurate progress bar. Particularly ones that promise a quick finish but then don't deliver. I'd give the progress bar displayed by any installer on Windows as a good example of one that is fundamentally broken. Just not a shining example of an implementation that you should pursue.

Such a progress bar is broken because it is utterly impossible to guess up front how long it is going to take to install a program. File systems have very unpredictable perf. This is a very common problem with estimating execution time. Better UI models are the spinning dots you'd see in a video player and many programs in Windows 8. Or the marquee style supported by the common ProgressBar control. Just feedback that says "I'm not dead, working on it". Even the hour-glass cursor is better than a bad estimate. If you have something to report beyond a technicality that no user is really interested in then don't hesitate to display that. Like the number of files you've processed or the number of kilobytes you've downloaded. The actual value of the number isn't that useful, seeing the rate at which it increases is the interesting tidbit.

Norms, rules or guidelines for calculating and showing "ETA/ETC" for a process

Tags:

c#

winforms

Hjalmar Z

2 Answers

Examples of good estimates

Examples of bad estimates

Conclusions

usr-local-ΕΨΗΕΛΩΝ

Hans Passant

Recent Activity

Donate For Us

Norms, rules or guidelines for calculating and showing "ETA/ETC" for a process

Tags:

c#

winforms

Hjalmar Z

2 Answers

Examples of good estimates

Examples of bad estimates

Conclusions

usr-local-ΕΨΗΕΛΩΝ

Hans Passant

Related questions

Recent Activity

Donate For Us