I have several ordered List of X/Y Pairs and I want to calculate a ordered List of X/Y Pairs representing the average of these Lists.
All these Lists (including the "average list") will then be drawn onto a chart (see example picture below).
I have several problems:
I need to implement this in C#, altought I guess that's not really important for the algorithm itself.
Sorry, that I can't explain my problem in a more formal or mathematical way.
EDIT: I replaced the term "function" with "List of X/Y Pairs" which is less confusing.
AutoSum lets you find the average in a column or row of numbers where there are no blank cells. Click a cell below the column or to the right of the row of the numbers for which you want to find the average. On the HOME tab, click the arrow next to AutoSum > Average, and then press Enter.
To calculate an average, add up all the terms, and then divide by the number of terms you added. The result is the (mean) average.
To summarize, to compute the average value of f(x) over [a,b], compute the integral of f over the interval and divide by the length of the interval: average=1b−a∫baf(x)dx.
Returns the average (arithmetic mean) of the arguments. For example, if the range A1:A20 contains numbers, the formula =AVERAGE(A1:A20) returns the average of those numbers.
I would use the method Justin proposes, with one adjustment. He suggests using a mappingtable with fractional indices, though I would suggest integer indices. This might sound a little mathematical, but it's no shame to have to read the following twice(I'd have to too). Suppose the point at index i in a list of pairs A has searched for the closest points in another list B, and that closest point is at index j. To find the closest point in B to A[i+1] you should only consider points in B with an index equal to or larger than j. It will probably by j + 1, but could be j or j + 2, j + 3 etc, but never below j. Even if the point closest to A[i+1] has an index smaller than j, you still shouldn't use that point to interpolate with, since that would result in an unexpected average and graph. I'll take a moment now to create some sample code for you. I hope you see that this optimalization makes sense.
EDIT: While implementing this, I realised that j is not only bounded from below(by the method described above), but also bounded from above. When you try the distance from A[i+1] to B[j], B[j+1], B[j+2] etc, you can stop comparing when the distance A[i+1] to B[j+...] stops decreasing. There's no point in searching further in B. The same reasoning applies as when j was bounded from below: even if some point elsewhere in B would be closer, that's probably not the point you want to interpolate with. Doing so would result in an unexpected graph, probably less smooth than you'd expect. And an extra bonus of this second bound is the improved performance. I've created the following code:
IEnumerable<Tuple<double, double>> Average(List<Tuple<double, double>> A, List<Tuple<double, double>> B)
{
if (A == null || B == null || A.Any(p => p == null) || B.Any(p => p == null)) throw new ArgumentException();
Func<double, double> square = d => d * d;//squares its argument
Func<int, int, double> euclidianDistance = (a, b) => Math.Sqrt(square(A[a].Item1 - B[b].Item1) + square(A[a].Item2 - B[b].Item2));//computes the distance from A[first argument] to B[second argument]
int previousIndexInB = 0;
for (int i = 0; i < A.Count; i++)
{
double distance = euclidianDistance(i, previousIndexInB);//distance between A[i] and B[j - 1], initially
for (int j = previousIndexInB + 1; j < B.Count; j++)
{
var distance2 = euclidianDistance(i, j);//distance between A[i] and B[j]
if (distance2 < distance)//if it's closer than the previously checked point, keep searching. Otherwise stop the search and return an interpolated point.
{
distance = distance2;
previousIndexInB = j;
}
else
{
break;//don't place the yield return statement here, because that could go wrong at the end of B.
}
}
yield return LinearInterpolation(A[i], B[previousIndexInB]);
}
}
Tuple<double, double> LinearInterpolation(Tuple<double, double> a, Tuple<double, double> b)
{
return new Tuple<double, double>((a.Item1 + b.Item1) / 2, (a.Item2 + b.Item2) / 2);
}
For your information, the function Average returns the same amount of interpolated points the list A contains, which is probably fine, but you should think about this for your specific application. I've added some comments in it to clarify some details, and I've described all aspects of this code in the text above. I hope it's clear, and otherwise feel free to ask questions.
SECOND EDIT: I misread and thought you had only two lists of points. I have created a generalised function of that above accepting multiple lists. It still uses only those principles explained above.
IEnumerable<Tuple<double, double>> Average(List<List<Tuple<double, double>>> data)
{
if (data == null || data.Count < 2 || data.Any(list => list == null || list.Any(p => p == null))) throw new ArgumentException();
Func<double, double> square = d => d * d;
Func<Tuple<double, double>, Tuple<double, double>, double> euclidianDistance = (a, b) => Math.Sqrt(square(a.Item1 - b.Item1) + square(a.Item2 - b.Item2));
var firstList = data[0];
for (int i = 0; i < firstList.Count; i++)
{
int[] previousIndices = new int[data.Count];//the indices of points which are closest to the previous point firstList[i - 1].
//(or zero if i == 0). This is kept track of per list, except the first list.
var closests = new Tuple<double, double>[data.Count];//an array of points used for caching, of which the average will be yielded.
closests[0] = firstList[i];
for (int listIndex = 1; listIndex < data.Count; listIndex++)
{
var list = data[listIndex];
double distance = euclidianDistance(firstList[i], list[previousIndices[listIndex]]);
for (int j = previousIndices[listIndex] + 1; j < list.Count; j++)
{
var distance2 = euclidianDistance(firstList[i], list[j]);
if (distance2 < distance)//if it's closer than the previously checked point, keep searching. Otherwise stop the search and return an interpolated point.
{
distance = distance2;
previousIndices[listIndex] = j;
}
else
{
break;
}
}
closests[listIndex] = list[previousIndices[listIndex]];
}
yield return new Tuple<double, double>(closests.Select(p => p.Item1).Average(), closests.Select(p => p.Item2).Average());
}
}
Actually that I did the specific case for 2 lists separately might have been a good thing: it is easily explained and offers a step before understanding the generalised version. Furthermore, the square root could be taken out, since it doesn't change the order of the distances when sorted, just the lengths.
THIRD EDIT: In the comments it became clear there might be a bug. I think there are none, aside from the mentioned small bug, which shouldn't make any difference except for at the end of the graphs. As a proof that it actually works, this is the result of it(the dotted line is the average):
I'll use a metaphor of your functions being cars racing down a curvy racetrack, where you want to extract the center-line of the track given the cars' positions. Each car's position can be described as a function of time:
p1(t) = (x1(t), y1(t))
p2(t) = (x2(t), y2(t))
p3(t) = (x3(t), y3(t))
The crucial problem is that the cars are racing at different speeds, which means that p1(10)
could be twice as far down the race track as p2(10)
. If you took a naive average of these two points, and there was a sharp curve in the track between the cars, the average may be far from the track.
If you could just transform your functions to no longer be a function of time, but a function of the distance along the track, then you would be able to do what you want.
One way you could do this would be to choose the slowest car (i.e., the one with the greatest number of samples). Then, for each sample of the slowest car's position, look at all of the other cars' paths, find the two closest points, and choose the point on the interpolated line which is closest to the slowest car's position. Then average these points together. Once you do this for all of the slow car's samples, you have an average path.
I'm assuming that all of the cars start and end in roughly the same places; if any of the cars just race a small portion of the track, you will need to add some more logic to detect that.
A possible improvement (for both performance and accuracy), is to keep track of the most recent sample you are using for each car and the speed of each car (the relative sampling rate). For your slowest car, it would be a simple map: 1 => 1, 2 => 2, 3 => 3, ... For the other cars, though, it could be more like: 1 => 0.3, 2 => 0.7, 3 => 1.6 (fractional values are due to interpolation). The speed would be the inverse of the change in sample number (e.g., the slow car would have speed 1, and the other car would have speed 1/(1.6-0.7)=1.11). You could then ensure that you don't accidentally backtrack on any of the cars. You could also improve the calculation speed because you don't have to search through the whole set of all points on each path; instead, you can assume that the next sample will be somewhere close to the current sample plus 1/speed.
As these are not y=f(x)
functions, are they perhaps something like (x,y)=f(t)
?
If so, you could interpolate along t, and calculate avg(x) and avg(y) for each t.
EDIT This of course assumes that t can be made available to your code - so that you have an ordered list of T/X/Y triples.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With