Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Enumerable.Average and OverflowException

Tags:

c#

linq

Perhaps a useless question:

public static double Average<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, int> selector
)

One of exceptions thrown by the above method is also OverflowException: The sum of the elements in the sequence is larger than Int64.MaxValue.

I assume reason for this exception is that sum of the averaged values is computed using variable S of type long? But since return value is of type double, why didn't designers choose to make S also of type double?

Thank you

like image 396
flockofcode Avatar asked Apr 19 '11 19:04

flockofcode


2 Answers

Because this particular overload knows that you're starting out with int values, it knows you're not using decimal values. Converting each of your values to a double and then adding the double values together would probably be less efficient, and would definitely open you up to the possibility of floating point imprecision issues if you had a large enough collection of values.

Update

I just did a quick benchmark, and it takes roughly 50% longer over twice as long to average doubles as it does to average ints.

like image 116
StriplingWarrior Avatar answered Oct 23 '22 20:10

StriplingWarrior


First off, I note that the exception does not arise until you have exceeded the bounds of a long. How are you going to do that? Each int can be at most about two billion, and the top of a long is about eight billion billion, so that means that you'd have to be taking the average of more than four billion ints minimum in order to trigger the exception. Is that the sort of problem you regularly have to solve?

Suppose for the sake of argument it is. Doing the math in doubles loses precision because double arithmetic is rounded off to about fifteen decimal places. Watch:

using System;
using System.Collections.Generic;
static class Extensions
{
    public static double DoubleAverage(this IEnumerable<int> sequence)
    {
        double sum = 0.0;
        long count = 0;
        foreach(int item in sequence) 
        {
            ++count;
            sum += item;
        }
        return sum / count;
    }
    public static IEnumerable<T> Concat<T>(this IEnumerable<T> seq1, IEnumerable<T> seq2)
    {
        foreach(T item in seq1) yield return item;
        foreach(T item in seq2) yield return item;
    }
}


class P
{
    public static IEnumerable<int> Repeat(int x, long count)
    {
        for (long i = 0; i < count; ++i) yield return x;
    }

    public static void Main()
    {
        System.Console.WriteLine(Repeat(1000000000, 10000000).Concat(Repeat(1, 90000000)).DoubleAverage()); 
        System.Console.WriteLine(Repeat(1, 90000000).Concat(Repeat(1000000000, 10000000)).DoubleAverage()); 
    }
}

Here we average with double arithmetic two series: one that is {a billion, a billion, a billion ... ten million times ... a billion, one, one one... ninety million times} and one that is the same sequence with the ones first and the billions last. If you run the code, you get different results. Not hugely different, but different, and the difference will become larger and larger the longer the sequences get. Long arithmetic is exact; double arithmetic potentially rounds off for every calculation and that means that massive error can accrue over time.

It seems very unexpected to do an operation solely on ints that results in an accumulation of floating point rounding error. That's the sort of thing one expected when doing an operation on floats, but not when doing it on ints.

like image 29
Eric Lippert Avatar answered Oct 23 '22 18:10

Eric Lippert