Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding an ETA to an embedded loop sequence

UPDATE 1:

Some of the solutions offered below seem good. However, I only know the amount of times a loop will iterate after it's parent loop's iterations have been determined. So I can't count all the iterations beforehand.

ORIGINAL QUESTION:

I have embedded loops in a program similar to this:

Prog1:

using System;
using System.Threading;

namespace time_remaining_loop_strip
{
    class Program
    {
        static void Main(string[] args)
        {
            var random = new Random();
            Console.Clear();

            // Simulate initiation delay
            Console.WriteLine("initiate");
            Thread.Sleep(random.Next(100, 1000));

            int intCount = random.Next(1, 10);

            for (int loop1 = 0; loop1 <= intCount; loop1++) 
            {
                // Simulate loop1 delay
                Console.WriteLine("\tloop1");
                Thread.Sleep(random.Next(100, 1000));

                for (int loop2 = 0; loop2 <= random.Next(1, 10); loop2++) 
                {
                    // Simulate loop2 delay
                    Console.WriteLine("\t\tloop2");
                    Thread.Sleep(random.Next(100, 1000));

                    for (int loop3 = 0; loop3 <= random.Next(1, 10); loop3++) 
                    {
                        // Simulate loop3 delay
                        Console.WriteLine("\t\t\tloop3");
                        Thread.Sleep(random.Next(100, 1000));

                        for (int loop4 = 0; loop4 <= random.Next(1, 10); loop4++) 
                        {
                            // Simulate loop4 delay
                            Console.WriteLine("\t\t\t\tloop4");
                            Thread.Sleep(random.Next(100, 1000));
                        }
                    }
                }
            }
        }
    }
}

I am trying to display Processing Time Remaining (ETA), so I can see a rough estimate of the amount of time remaining before the loop sequence above completes

I now have another bit of code which does display an ETA which works fine when the loop is very simplistic:

Prog2:

using System;
using System.Threading;

namespace time_remaining
{
    class Program
    {
        public static TimeSpan ComputeRemaining((int count, DateTime time) start, (int count, DateTime time) current, int end) =>
            current.count - start.count == 0
            ? TimeSpan.MaxValue
            : TimeSpan.FromSeconds((end - current.count) * current.time.Subtract(start.time).TotalSeconds / (current.count - start.count));

        static void Main(string[] args)
        {
            Console.Clear();

            var random = new Random();
            int Count = random.Next(10, 60);
            DateTime startTime = DateTime.Now;

            for (int i = 0; i <= Count; i++)
            {
                Thread.Sleep(random.Next(100, 2000));
                TimeSpan timeRemaining = ComputeRemaining((0, startTime), (i, DateTime.Now), Count);

                Console.SetCursorPosition(0,0);
                Console.Write("ETA: ");
                Console.Write(String.Format("{0} Days, {1} Hours, {2} Minutes, {3} Seconds", timeRemaining.Days.ToString().PadLeft(3,'0'), timeRemaining.Hours.ToString().PadLeft(2,'0'), timeRemaining.Minutes.ToString().PadLeft(2,'0'), timeRemaining.Seconds.ToString().PadLeft(2,'0')));
            }
        }
    }
}

When I try to combine the ETA aspect of Prog1 into Prog2, it does not seem to work well:

Prog3 = Prog1+Prog2:

using System;
using System.Threading;

namespace time_remaining_loop_strip
{
    class Program
    {
        public static TimeSpan ComputeRemaining((int count, DateTime time) start, (int count, DateTime time) current, int end) =>
            current.count - start.count == 0
            ? TimeSpan.MaxValue
            : TimeSpan.FromSeconds((end - current.count) * current.time.Subtract(start.time).TotalSeconds / (current.count - start.count));

        static void Main(string[] args)
        {
            DateTime startTime = DateTime.Now;
            
            var random = new Random();
            Console.Clear();

            // Simulate initiation delay
            //Console.WriteLine("initiate");
            Thread.Sleep(random.Next(100, 1000));

            int intCount = random.Next(1, 10);

            for (int loop1 = 0; loop1 <= intCount; loop1++) 
            {
                // Simulate loop1 delay
                //Console.WriteLine("\tloop1");
                Thread.Sleep(random.Next(100, 1000));

                for (int loop2 = 0; loop2 <= random.Next(1, 10); loop2++) 
                {
                    // Simulate loop2 delay
                    //Console.WriteLine("\t\tloop2");
                    Thread.Sleep(random.Next(100, 1000));

                    for (int loop3 = 0; loop3 <= random.Next(1, 10); loop3++) 
                    {
                        // Simulate loop3 delay
                        //Console.WriteLine("\t\t\tloop3");
                        Thread.Sleep(random.Next(100, 1000));

                        for (int loop4 = 0; loop4 <= random.Next(1, 10); loop4++) 
                        {
                            // Simulate loop4 delay
                            //Console.WriteLine("\t\t\t\tloop4");
                            Thread.Sleep(random.Next(100, 1000));
                        }
                    }
                }

                TimeSpan timeRemaining = ComputeRemaining((0, startTime), (loop1, DateTime.Now), intCount);

                Console.SetCursorPosition(0,0);
                Console.Write("ETA: ");
                Console.Write(String.Format("{0} Days, {1} Hours, {2} Minutes, {3} Seconds", timeRemaining.Days.ToString().PadLeft(3,'0'), timeRemaining.Hours.ToString().PadLeft(2,'0'), timeRemaining.Minutes.ToString().PadLeft(2,'0'), timeRemaining.Seconds.ToString().PadLeft(2,'0')));
            }
        }
    }
}

This doesn't seem to work very well at all. It does display an ETA, but it has long delays before it shows anything, because of the way the loop is structured.

How can I update this so the ETA code displays an ETA more accurately and at more predictive intervals, as in every second?

like image 874
oshirowanen Avatar asked Nov 06 '22 05:11

oshirowanen


1 Answers

With the concession that you've just built a simple model of what's actually happening (you have some series of nested process of variable delay and count, which cannot even be determined until runtime), what you're asking for currently is a random number predictor.

There will be between n & m cycles (1 & 10 in your example) of between o & p duration (100 & 1000ms), but still random.. well, random quantized into bands. The way you've written it, this is dice-roll random (a dice has no memory), although in practice it seems more likely the duration of one cycle must somewhat imply the duration of the next, (which is how you've written ComputeRemaining) and within one band the count of one loop must help with the count of the next.

So despite its apparent simplicity Prog2 covers our examples.. given a known loop count, where each cycle takes a random duration (which is in fact pick(n,m)^3*pick(o,p).. but this is still just a random number) - predict the end. For reporting purposes, you'll want to refactor this to consider the inner loops as well too, but it's effectively the same process. (^3 is a simplification, it's actually a series of independent picks multiplied)

Ok, so we don't need to talk about time/delays (I mean.. you clearly want that, but it's just some number that represents the future - a TimeSpan is an long count of x ticks since some time... a finish time is just Now + x*tick). So we can simplify this into a long predictor.

Setup

interface IAvger
{
    public double Avg { get; }
}
interface IAdder
{
    void Add(long value);
}
class Mean
{
    private int _count = 0;

    public double Total { get; private set; } = 0;
    public double? Avg => _count == 0 ? null : (double?)(Total / _count);

    public void Add(double val)
    {
        Total += val;
        _count++;
    }
}

You can ignore the interfaces (I used them when switching out potential solutions). Class Mean should be familiar... it calculates the mean average of a number of values, scaling/adapting as more values are found.

/// <summary>
/// Equivalent to your ComputeRemaining
/// </summary>
class RunningAvg : IAvger, IAdder
{
    private Mean _mean = new Mean();
    private readonly double _guess;
    public RunningAvg(double guess)
    {
        _guess = guess;
    }

    public double Avg => _mean.Avg ?? _guess;

    public void Add(long value) => _mean.Add(value);
}

Here's an equivalent to your ComputeRemaining. The value of guess helps an early prediction when nothing else is known (vs the equivalent of TimeSpan.Max)

/// <summary>
/// Drop the lowest and highest value
/// - Fairly typical in stats, however note this is only one biggest/smallest
/// (will work best when the standard devation is low, and outliers
/// are rare)
/// </summary>
class IgnoreExtremes : IAvger, IAdder
{
    private long? _worst;
    private long? _best;
    private Mean _mean = new Mean();
    private readonly int _wt;
    private readonly double _guess;
    public IgnoreExtremes(double guess, int weight = 4)
    {
        _wt = weight;
        _guess = guess;
    }

    public long Best => _best ?? (long)Math.Round(_guess);
    public long Worst => _worst ?? (long)Math.Round(_guess);

    public double Avg
    {
        get
        {
            var avg = _mean.Avg;
            if (!avg.HasValue) return _guess;
            return (Best + _wt * avg.Value + Worst) / (2 + _wt);
        }
    }

    public void Add(long value)
    {
        if (!_best.HasValue)
        {
            _best = value;
        }
        else if (value < _best)
        {
            _mean.Add(_best.Value);
            _best = value;
        }
        else if (!_worst.HasValue)
        {
            _worst = value;
        }
        else if (value > _worst)
        {
            _mean.Add(_worst.Value);
            _worst = value;
        }
        else
        {
            _mean.Add(value);
        }
    }
}

Finally some stats! IgnoreExtremes suppresses the highest and lowest (single) value. In scientific sampling it's fairly typical to ignore these, however with a real random distribution of numbers (eg dice roll, or random.Next) only one extreme will be discarded. This should predict better numbers than RunningAvg. Note this is a form of weighted average, you can tune it (slightly) by supplying a weight value at construction (wt=4 is fairly common), or tying _wt to _mean.count (some code changes required)

class IgnoreStdDevOutlier : IAvger, IAdder
{
    private const int AT_LEAST = 5;
    private Mean _mean = new Mean();
    private readonly List<long> _vals = new List<long>();
    private readonly double _guess;
    //private long _tot;
    private readonly double _outlierStdDevMulti;
    public IgnoreStdDevOutlier(double guess, double outlierStdDevMulti = 2)
    {
        _guess = guess;
        _outlierStdDevMulti = outlierStdDevMulti;
    }

    private double StdDev()
    {
        var avg = Avg;
        double tot = 0;
        foreach (var item in _vals)
            tot += (item - avg) * (item - avg);
        return Math.Sqrt(tot / (_vals.Count - 1));
    }

    public void Add(long value)
    {
        _vals.Add(value);
        if (_vals.Count > AT_LEAST)
        {
            var avg = Avg;
            var sd = StdDev();
            var min = avg - _outlierStdDevMulti * sd;
            var max = avg + _outlierStdDevMulti * sd;
            //Ignore outliers
            if (value < min || value > max) return;
        }
        _mean.Add(value);
    }

    public double Avg => _mean.Avg ?? 0;
}

Another statistics approach is to ignore values more than n*StandardDeviation from average, where n is often 2 or 3 (you'll find conflicting opinions). All values seen are part of the standard deviation, but only those that aren't outliers are considered part of the average. This ends up acting like a suppression factor, preventing the estimate swinging too much.

Alright, so to run a test we need some sort of measuring class:

class Performance
{
    private readonly List<long> _set = new List<long>();
    private long _actual;

    public void Add(long item) => _set.Add(item);

    public void SetFinal(long final) => _actual = final;

    public void Report()
    {
        foreach (var item in _set)
        {
            Console.WriteLine("{0} {1} = {2}", item, _actual, (item / (double)_actual - 1) * 100);
        }
    }
}

A real guess can't know the final (_actual) value, but this class allows us to see how the guess is doing so far.

Finally the program class:

class Program
{
    const int MIN_MSEC = 100;
    const int MAX_MSEC = 1000;
    const int MIN_LOOP = 10;
    const int MAX_LOOP = 50;

    static void C(Random random)
    {
        int nm = random.Next(MAX_LOOP, MAX_LOOP);
        var guess = (double)((MAX_LOOP + MIN_LOOP) / 2 * (MAX_MSEC + MIN_MSEC) / 2);

        var predict = new RunningAvg(guess);
        //var predict = new IgnoreExtremes(guess);
        //var predict = new IgnoreStdDevOutlier(guess,3);
        var per = new Performance();
        long tot = 0;
        for (int i = 0; i <= nm; i++)
        {
            var op = random.Next(MIN_MSEC, MAX_MSEC);
            predict.Add(op);
            per.Add((long)Math.Round(predict.Avg * nm));
            tot += op;
        }
        per.SetFinal(tot);
        per.Report();
    }

    static void Main(string[] args)
    {
        var random = new Random();
        C(random);
    }
}

You can ignore the work is done in a method called C (just a side effect of the code - A was your Prog1, while B was Prog2). Within C try changing which of RunningAvg, IgnoreExtremes or IgnoreStdDevOutlier is uncommented. Because, again, what's written is dice-roll random, you cannot take a single run as a good benchmark. The next stage is to wrap this in repeat-runs and take the average of the standard deviation of the predictions (or perhaps only the late predictions - a user probably doesn't mind if an early estimate is far off, as long as by the end it isn't jumping around) - but I ran out of time. I find IgnoreStdDevOutlier, on average, converges on the right answer fairly swiftly being off by 0-1% by the end. IgnoreExtremes suffers from only ignoring one extreme (in each direction), so it's sort of a light version of IgnoreStdDevOutlier. If your data isn't random, and there are only occasionally extreme cases - it'll do just fine. RunningAvg doesn't actually perform terribly some of the time, other times it's off by double digit percentages all the way through. Random numbers, if only they were easy to predict.

Note on usage

  • Timespan.Ticks is a long. All of this is written to predict a long, which can be considered the difference between then and now. To directly switch use new Timespan(long ticks) to build the spans, and DateTime.Now.Subtract(startTime).Ticks to get a long from a span. Obviously all these classes could be re-written for TimeSpan instead of long.. unfortunately there isn't an easy generic where constraint that includes both long and TimeSpan
like image 71
Rudu Avatar answered Nov 11 '22 08:11

Rudu