Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LINQ to calculate a moving average of a SortedList<dateTime,double>

I have a time series in the form of a SortedList<dateTime,double>. I would like to calculate a moving average of this series. I can do this using simple for loops. I was wondering if there is a better way to do this using linq.

my version:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            var mySeries = new SortedList<DateTime, double>();
            mySeries.Add(new DateTime(2011, 01, 1), 10);
            mySeries.Add(new DateTime(2011, 01, 2), 25);
            mySeries.Add(new DateTime(2011, 01, 3), 30);
            mySeries.Add(new DateTime(2011, 01, 4), 45);
            mySeries.Add(new DateTime(2011, 01, 5), 50);
            mySeries.Add(new DateTime(2011, 01, 6), 65);

            var calcs = new calculations();
            var avg = calcs.MovingAverage(mySeries, 3);
            foreach (var item in avg)
            {
                Console.WriteLine("{0} {1}", item.Key, item.Value);                
            }
        }
    }
    class calculations
    {
        public SortedList<DateTime, double> MovingAverage(SortedList<DateTime, double> series, int period)
        {
            var result = new SortedList<DateTime, double>();

            for (int i = 0; i < series.Count(); i++)
            {
                if (i >= period - 1)
                {
                    double total = 0;
                    for (int x = i; x > (i - period); x--)
                        total += series.Values[x];
                    double average = total / period;
                    result.Add(series.Keys[i], average);  
                }

            }
            return result;
        }
    }
}
like image 799
Andre P. Avatar asked Mar 02 '11 11:03

Andre P.


2 Answers

In order to achieve an asymptotical performance of O(n) (as the hand-coded solution does), you could use the Aggregate function like in

series.Skip(period-1).Aggregate(
  new {
    Result = new SortedList<DateTime, double>(), 
    Working = List<double>(series.Take(period-1).Select(item => item.Value))
  }, 
  (list, item)=>{
     list.Working.Add(item.Value); 
     list.Result.Add(item.Key, list.Working.Average()); 
     list.Working.RemoveAt(0);
     return list;
  }
).Result;

The accumulated value (implemented as anonymous type) contains two fields: Result contains the result list build up so far. Working contains the last period-1 elements. The aggregate function adds the current value to the Working list, builds the current average and adds it to the result and then removes the first (i.e. oldest) value from the working list.

The "seed" (i.e. the starting value for the accumulation) is build by putting the first period-1 elements into Working and initializing Result to an empty list.

Consequently tha aggregation starts with element period (by skipping (period-1) elements at the beginning)

In functional programming this is a typical usage pattern for the aggretate (or fold) function, btw.

Two remarks:

The solution is not "functionally" clean in that the same list objects (Working and Result) are reused in every step. I'm not sure if that might cause problems if some future compilers try to parallellize the Aggregate function automatically (on the other hand I'm also not sure, if that's possible after all...). A purely functional solution should "create" new lists at every step.

Also note that C# lacks powerful list expressions. In some hypothetical Python-C#-mixed pseudocode one could write the aggregation function like

(list, item)=>
  new {
    Result = list.Result + [(item.Key, (list.Working+[item.Value]).Average())], 
    Working=list.Working[1::]+[item.Value]
  }

which would be a bit more elegant in my humble opinion :)

like image 151
MartinStettner Avatar answered Sep 19 '22 14:09

MartinStettner


For the most efficient way possible to compute a Moving Average with LINQ, you shouldn't use LINQ!

Instead I propose creating a helper class which computes a moving average in the most efficient way possible (using a circular buffer and causal moving average filter), then an extension method to make it accessible to LINQ.

First up, the moving average

public class MovingAverage
{
    private readonly int _length;
    private int _circIndex = -1;
    private bool _filled;
    private double _current = double.NaN;
    private readonly double _oneOverLength;
    private readonly double[] _circularBuffer;
    private double _total;

    public MovingAverage(int length)
    {
        _length = length;
        _oneOverLength = 1.0 / length;
        _circularBuffer = new double[length];
    }       

    public MovingAverage Update(double value)
    {
        double lostValue = _circularBuffer[_circIndex];
        _circularBuffer[_circIndex] = value;

        // Maintain totals for Push function
        _total += value;
        _total -= lostValue;

        // If not yet filled, just return. Current value should be double.NaN
        if (!_filled)
        {
            _current = double.NaN;
            return this;
        }

        // Compute the average
        double average = 0.0;
        for (int i = 0; i < _circularBuffer.Length; i++)
        {
            average += _circularBuffer[i];
        }

        _current = average * _oneOverLength;

        return this;
    }

    public MovingAverage Push(double value)
    {
        // Apply the circular buffer
        if (++_circIndex == _length)
        {
            _circIndex = 0;
        }

        double lostValue = _circularBuffer[_circIndex];
        _circularBuffer[_circIndex] = value;

        // Compute the average
        _total += value;
        _total -= lostValue;

        // If not yet filled, just return. Current value should be double.NaN
        if (!_filled && _circIndex != _length - 1)
        {
            _current = double.NaN;
            return this;
        }
        else
        {
            // Set a flag to indicate this is the first time the buffer has been filled
            _filled = true;
        }

        _current = _total * _oneOverLength;

        return this;
    }

    public int Length { get { return _length; } }
    public double Current { get { return _current; } }
}

This class provides a very fast and lightweight implementation of a MovingAverage filter. It creates a circular buffer of Length N and computes one add, one subtract and one multiply per data-point appended, as opposed to the N multiply-adds per point for the brute force implementation.

Next, to LINQ-ify it!

internal static class MovingAverageExtensions
{
    public static IEnumerable<double> MovingAverage<T>(this IEnumerable<T> inputStream, Func<T, double> selector, int period)
    {
        var ma = new MovingAverage(period);
        foreach (var item in inputStream)
        {
            ma.Push(selector(item));
            yield return ma.Current;
        }
    }

    public static IEnumerable<double> MovingAverage(this IEnumerable<double> inputStream, int period)
    {
        var ma = new MovingAverage(period);
        foreach (var item in inputStream)
        {
            ma.Push(item);
            yield return ma.Current;
        }
    }
}

The above extension methods wrap the MovingAverage class and allow insertion into an IEnumerable stream.

Now to use it!

int period = 50;

// Simply filtering a list of doubles
IEnumerable<double> inputDoubles;
IEnumerable<double> outputDoubles = inputDoubles.MovingAverage(period);   

// Or, use a selector to filter T into a list of doubles
IEnumerable<Point> inputPoints; // assuming you have initialised this
IEnumerable<double> smoothedYValues = inputPoints.MovingAverage(pt => pt.Y, period);
like image 32
Dr. Andrew Burnett-Thompson Avatar answered Sep 21 '22 14:09

Dr. Andrew Burnett-Thompson