Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to reduce number of latitude and longitude points

I'm trying to reduce and combine a number of points to the center point of those locations. Right now I'm brute-forcing it by finding the closest pair, combining those and repeating until I've reduced it to my target (side note: actually I reduced the problem by sorting by (lat*lat+long*long) then searching 10% on either side of each point, which with my tests has always found the shortest distance within that range).

For an example I'd want to reduce 4000 points down to 1000, ideally combining the closest points into the center of those closest points. Basically to build marker points that reflect the number of addresses in that area.

Is there any better algorithm that would give me as accurate as possible results? Or a quicker distance algorithm? I guess it does only need to be accurate at short distances


Right now I'm finding the distance with (Wikipedia had it under "spherical earth projected to a plane"):

double dLat = pos2.LatitudeR - pos1.LatitudeR;
double dLon = pos2.LongitudeR - pos1.LongitudeR;

double cosLatM = Math.Cos((pos2.LatitudeR + pos1.LatitudeR)/2) * dLon;
double a = dLat*dLat + cosLatM*cosLatM;

I've thought about grouping all the points within x distance of each other, then expanding x until I get to my target number of final points, but I'm not sure how to make that as accurate as my perfectionism would want it. That is all the ways I can think of would vary slightly depending on the order of the input list of points.


Edit to describe how my current algorithm processes (This is the ideal way to find the results I want, but an approximation that is much quicker would be worth it):

Describing it linearly, if you had x=1,4,5,6,10,20,22

  1. It would combine 4+5=4.5 [first 1.0 distance it finds]
  2. (4.5*2+6)/3=5 -- x=1,5,10,20,22 [1.5 distance]
  3. 20+22=21 -- x=1,5,10,21 [2.0 distance]
  4. (5*3+1)/4=4 -- x=4,10,21 [4.0 distance]
  5. (4*4+10)/5.2 -- So you'd end up with x=5.2,21. (It keeps track of the CombineCount so it can find the correct average center in this way)

Results: Here is my current Distance function, with lookup table generation for cos^2. Haven't had time to check how close together my points are, so haven't implemented Joey's suggestion of approximating the cos^2, but that could improve the speed over the lookup table here.

The K-Cluster algorithm I tried (see my comment on that answer) did not combine them as I wanted, it ended up with a ton of points near the center of the map and few points toward the edges. So unless I can correct that I'm using my algorithm which is slower.

public static double Distance(AddressCoords pos1, AddressCoords pos2, DistanceType type)
{
    if (LookupTable == null) LookupTable = BuildLookup();

    double R = (type == DistanceType.Miles) ? 3960 : 6371;

    double dLat = pos2.LatitudeR - pos1.LatitudeR;
    double dLon = pos2.LongitudeR - pos1.LongitudeR;

    double LatM = ((pos2.LatitudeR + pos1.LatitudeR)/2);
    if (LatM < 0) LatM = -LatM; //Don't allow any negative radian values
    double cosLatM2 = LookupTable[(int)(LatM * _cacheStepInverse)];
    double a = dLat*dLat + cosLatM2 * dLon*dLon;

    //a = Math.Sqrt(a);

    double d = a * R;

    return d;
}

private const double _cacheStep = 0.00002;
private const double _cacheStepInverse = 50000;

private static double[] LookupTable = null;

public static double[] BuildLookup()
{
    // set up array
    double maxRadian = Math.PI*2;
    int elements = (int)(maxRadian * _cacheStepInverse) + 1;

    double[] _arrayedCos2 = new double[elements];
    int i = 0;
    for (double angleRadians = 0; angleRadians <= maxRadian;
        angleRadians += _cacheStep)
    {
        double cos = Math.Cos(angleRadians);
        _arrayedCos2[i] = cos*cos;
        i++;
    }
    return _arrayedCos2;
}
like image 229
Thymine Avatar asked Oct 04 '11 04:10

Thymine


2 Answers

To speed up working out distances between points:

If you do some elementary algebra you get:

D = R*Sqrt(Lat2^2 + Lat1^2 - 2*Lat1*Lat2 + cos^2((Lat2 + Lat1) /2)(Lon2^2 + Lon1^2 - 2*Lon1*Lon2))

The first thing you can do to speed this up is normalise to the radius of the Earth (R) and compare squared distances rather than distances, thus avoiding the square root and R term, saving yourself 2 calculations per comparison. Leaving:

valToCompare = Lat2^2 + Lat1^2 - 2*Lat1*Lat2 + cos^2((Lat2 + Lat1) /2)(Lon2^2 + Lon1^2 - 2*Lon1*Lon2)

Another thing you could do is precalculate Lat^2 and Lon^2 for each coordinate - reducing the number of calculations for each comparison by 4.

Further, if the points are all relatively close together in latitude you could take an approximation for the cos^2 term by precalculating it using the latitude of a random point or the average latitude of all the points, rather than the average latitude of the two points being compared. This reduces the number of calculations for each comparison by another 4.

Finally, you could pre-calculate 2*Lat and 2*Lon for each point cutting out 2 more calculations for each comparison.

None of this improves your algorithm per se but it should make it run faster and can be applied to any algorithm that needs to compare the distances between points.

like image 164
Joey Avatar answered Oct 18 '22 11:10

Joey


Have you considered looking at K-Cluster algorithms?

These kind of algorithms are used to "group" close/related objects (in your case, points) into clusters, based on their nearest Mean. These algorithms are usually quite optimized, and are built to handle large amount of data. In the case of 4000 points -> 1000 points, you would run a 1000-Cluster run on your data, and get back 1000 groups of points, each can be merged to a single point.

like image 42
SaguiItay Avatar answered Oct 18 '22 11:10

SaguiItay