I'm looking for an algorithm that can quickly run through a short (<30 element) array and merge points that are approximately equal. It'll probably end up being some sort of segmentation algorithm.
The context is as follows: I'm looking for the tallest peaks in a dataset. I've already separated the tallest maximums from the dross using a one-dimensional implementation of J-SEG, but anywhere where the dataset is "flat," I get back a point for every element along the plateau. I need to be able to adaptively merge these points to a single point at the center of the plateau. (This also means I don't know how many clusters there will be.)
Sample dataset 1 (Sample/Artificial input) Input:
97 54686024814922.8
118 406406320535.935
148 24095826539423.7
152 1625624905272.95
160 1625625128029.81
166 1625625152145.47
176 1625625104745.48
179 1625625127365.09
183 1625625152208.44
190 1625624974205.81
194 21068100428092.9
247 54686024895222.1
Ideal Output:
97 54686024814922.8
118 406406320535.935
148 24095826539423.7
159 1625625061816.08
182 1625625089631.21
194 21068100428092.9
247 54686024895222.1
Sample dataset 2 (Real input): Input:
2 196412376940671
123 206108518197124
135 194488685387149
148 178463949513298
154 192912098976702
156 195042451997727
161 195221254214493
168 204760073508681
172 189240741651297
182 191554457423846
187 215014126955355
201 202294866774063
Ideal output:
2 196412376940671
123 206108518197124
135 194488685387149
148 178463949513298
157 194391935062974
168 204760073508681
172 189240741651297
182 191554457423846
187 215014126955355
201 202294866774063
Sample Dataset 3 (Real input) Input:
2 299777367852602
26 263467434856928
35 293412234811901
83 242768805551742
104 226333969841383
107 227548774800053
178 229173574175201
181 229224441416751
204 244334414017228
206 245258151638118
239 198782930497571
Ideal output:
2 299777367852602
26 263467434856928 (May be merged
35 293412234811901 depending on parameters)
83 242768805551742
105.5 226941372320718
179.5 229199007795976
205 244796282827673
239 198782930497571
(Will edit in further information as needed.)
I'm not sure if this is exactly what you want, but there aren't any other answers posted yet so here we go.
I looked at it from the perspective of a graph. If I were looking at a graph and I wanted to determine which points were horizontally similar that would end up being relative to the graphs scale. So I made a function that accepts a percentage of the scale that you want to be considered the same. It then takes that percentage and multiplies it by the maximum difference between your dataset.
Additionally the similar value is always compared against the average of the currently located plateau. Once a a plateau is detected to end it adds the x's together and divides by 2 to get the middle and then takes the average y value and adds it as a final data point.
Without having access to good sample data all I have to go on is my very poor data generator that I made. But in my testing values within 1% generally eliminated about half of my data points.
Now it's important to note that this is one dimensional, x distance is completely ignored. You could very easily expand it to be two dimension as well. Also something else that I considered doing is instead of outputting a single data point to represent plateaus you could output a start and end point of the average instead.
namespace PointCondenser
{
public static class Extensions
{
static public bool AlmostEqual<T>(this T value, T value2, T epsilon)
{
return (Math.Abs((dynamic)value - value2) < epsilon);
}
}
public struct Point
{
public Point(double x, double y)
{
X = x;
Y = y;
}
public override string ToString()
{
return string.Format("{0}\t{1}", X, Y);
}
public double X;
public double Y;
}
class Program
{
static public Point RandomYPoint(int i)
{
var r = new Random();
var r2 = new Random(i);
var variance = r2.NextDouble() / 100;
return new Point(i, Math.Abs(r.NextDouble() - variance) * 100);
}
static public IEnumerable<Point> SmoothPoints(IEnumerable<Point> points, double percent)
{
if (percent <= 0 || percent >= 1)
throw new ArgumentOutOfRangeException("percent", "Percentage outside of logical bounds");
var final = new List<Point>();
var apoints = points.ToArray();
var largestDifference = apoints.Max(x => x.Y) - apoints.Min(x => x.Y);
var epsilon = largestDifference * percent;
var currentPlateau = new List<Point> { apoints[0] };
for (var i = 1; i < apoints.Length; ++i)
{
var point = apoints[i];
if (point.Y.AlmostEqual(currentPlateau.Average(x => x.Y), epsilon))
currentPlateau.Add(point);
else
{
var x = (currentPlateau[0].X + currentPlateau[currentPlateau.Count - 1].X) / 2.0;
var y = currentPlateau.Average(z => z.Y);
currentPlateau.Clear();
currentPlateau.Add(point);
final.Add(new Point(x, y));
}
}
return final;
}
static void Main(string[] args)
{
var r = new Random();
var points = new List<Point>();
for (var i = 0; i < 100; ++i)
{
for (var n = 0; n < r.Next(1, 5); ++n)
{
var p = RandomYPoint(points.Count);
points.Add(p);
Console.WriteLine(p);
}
Thread.Sleep(r.Next(10, 250));
}
Console.Write("\n\n Condensed \n\n");
var newPoints = SmoothPoints(points, .01);
foreach (var p in newPoints)
Console.WriteLine(p);
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With