I have a list of angles and want to get rid of outlier. My first idea is to calculate the median. Unfortunately there is the "wrap around" problem. I don't know of a "correct" way to define the median for a set of angles (or clock-positions).
My idea is to first calculate the mean, and use this to break the circle on the opposite side.
Example:
{6, 50, 52, 54, 60, 250} (in degree, 0-360)
average ~ 39
new range [-219, 219) -> new order 250, 6, 50, 52, 54, 60, 250
52 or 54 as median
Is this a good approach, or are there maybe better ones i don't know of?
Somewhat related: This Question showed ways to calculate the Mean of angles.
If you find the middle of any side of a triangle, you have found its midpoint. From that midpoint, you can construct a line segment to the opposite interior angle. That constructed line from the midpoint of a side to the opposite interior angle is a median.
To find the median first we have to arrange the data according to their size. If we have an odd number of terms then the middle term is the median. If we have an even number of terms then we have to add two middle terms and then divide the sum by 2. The mean we get is the median.
I believe the following approach would make sense:
The median is the point where the sum of distances to all input points is minimal (for an odd number of input points, there will be a whole range between two input points giving the same sum, so the middle of this range is usually taken). In the case of angles, which are periodic, the distance between two should be the minimal among the two possible orientations, that is, the one between 0 and pi.
As the minimum is realized on one of the input points (or two consecutive ones in the odd case, as previously explained), there is an obvious O(n^2) algorithm for n angles. It seems that this can be improved to O(n log n) by sorting the angles, computing the sum of distances for the first one, and updating the sum for each consecutive angle, by keeping track of where in the list the "antipode" of the base angle is falling.
You can use the approach shown in the question you linkes: Calculate the average as the angle of accumulated unit vectors of your angles. In my opinion, this approach is not very suited to large sets of vectors.
There's another approach that works with weighted interpolations. It doesn't require any trigonometric functions, which means that you can work with your data in degrees without converting them to radians.
In this approach, all angles must be between 0° and 360°. If they lie outside, they must be brought into this range, e.g. -5° becomes 355°. Then you do a pairwise weighted average, where you adjust the angles when their difference is more than a semicircle, so that you always avarage over the shorter arc between the angles. After averaging, the resulting angle is brought into the range 0° to 360°.
def angle_interpol(a1, w1, a2, w2):
"""Weighted avarage of two angles a1, a2 with weights w1, w2
diff = a2 - a1
if diff > 180: a1 += 360
elif diff < -180: a1 -= 360
aa = (w1 * a1 + w2 * a2) / (w1 + w2)
if aa > 360: aa -= 360
elif aa < 0: aa += 360
return aa
def angle_mean(angle):
"""Unweighted average of a list of angles"""
if not angle: return 0
aa = 0.0
ww = 0.0
for a in angle:
aa = angle_interpol(aa, ww, a, 1)
ww += 1
return aa
If you look at your example {6°, 50°, 52°, 54°, 60°, 250°}, you'll notice that all points lie on the same semicircle between 250° (or -110°) and 70°. With the proposed avarage method, the average angle is 18.67°. This is also the linear average of {6, 50, 52, 54, 60, -110}, which seems reasonable. The median would be between 50 and 52. The outlier is still the angle at 250°, but it is closer to the average if you come from -110° than if you come from 250°.
Another example is {0°, 0°, 90°}. The vector approach calculates atan(0.5)
, i.e. approximately 26.6° as average. The proposed approach determines 30° as average.
Calculating a circular average is only meaningful if your data is not evenly distributed in the feasible angle range. The arctan approach has a singularity if the angles cancel each other out; the approach proposed above just produces garbage.
To get the median of a set of numbers you sort them and then take the middle one. That is, if you have 7 numbers in sorted order, the median is the 3rd number.
You could do the same with angles, but the result makes little sense because the concept of "first angle" is not well defined when you have more than one angle.
To define the first angle you could sort the angles and find the largest gap between consecutive angles. The angle next to the largest gap between two angles intuitively feels like a good candidate to be a "first" angle.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With