Perhaps a useless question:
public static double Average<TSource>(
this IEnumerable<TSource> source,
Func<TSource, int> selector
)
One of exceptions thrown by the above method is also OverflowException: The sum of the elements in the sequence is larger than Int64.MaxValue.
I assume reason for this exception is that sum of the averaged values is computed using variable S of type long
? But since return value is of type double
, why didn't designers choose to make S also of type double
?
Thank you
Because this particular overload knows that you're starting out with int
values, it knows you're not using decimal values. Converting each of your values to a double
and then adding the double
values together would probably be less efficient, and would definitely open you up to the possibility of floating point imprecision issues if you had a large enough collection of values.
I just did a quick benchmark, and it takes roughly 50% longer over twice as long to average double
s as it does to average int
s.
First off, I note that the exception does not arise until you have exceeded the bounds of a long. How are you going to do that? Each int can be at most about two billion, and the top of a long is about eight billion billion, so that means that you'd have to be taking the average of more than four billion ints minimum in order to trigger the exception. Is that the sort of problem you regularly have to solve?
Suppose for the sake of argument it is. Doing the math in doubles loses precision because double arithmetic is rounded off to about fifteen decimal places. Watch:
using System;
using System.Collections.Generic;
static class Extensions
{
public static double DoubleAverage(this IEnumerable<int> sequence)
{
double sum = 0.0;
long count = 0;
foreach(int item in sequence)
{
++count;
sum += item;
}
return sum / count;
}
public static IEnumerable<T> Concat<T>(this IEnumerable<T> seq1, IEnumerable<T> seq2)
{
foreach(T item in seq1) yield return item;
foreach(T item in seq2) yield return item;
}
}
class P
{
public static IEnumerable<int> Repeat(int x, long count)
{
for (long i = 0; i < count; ++i) yield return x;
}
public static void Main()
{
System.Console.WriteLine(Repeat(1000000000, 10000000).Concat(Repeat(1, 90000000)).DoubleAverage());
System.Console.WriteLine(Repeat(1, 90000000).Concat(Repeat(1000000000, 10000000)).DoubleAverage());
}
}
Here we average with double arithmetic two series: one that is {a billion, a billion, a billion ... ten million times ... a billion, one, one one... ninety million times} and one that is the same sequence with the ones first and the billions last. If you run the code, you get different results. Not hugely different, but different, and the difference will become larger and larger the longer the sequences get. Long arithmetic is exact; double arithmetic potentially rounds off for every calculation and that means that massive error can accrue over time.
It seems very unexpected to do an operation solely on ints that results in an accumulation of floating point rounding error. That's the sort of thing one expected when doing an operation on floats, but not when doing it on ints.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With