My current code is pretty quick, but I need to make it even faster so we can accommodate even more markers. Any suggestions?
Notes:
The code:
$singleMarkers = array();
$clusterMarkers = array();
while (count($markers)) {
$marker = array_pop($markers);
$cluster = array();
// Compare marker against all remaining markers.
foreach ($markers as $key => $compareMarker) {
// This function returns the distance between two markers, at a defined zoom level.
$pixels = pixelDistance($marker['lat'], $marker['lng'], $compareMarker['lat'], $compareMarker['lng'], $zoomLevel);
// If two markers are closer than defined distance, remove compareMarker from array and add to cluster.
if ($pixels < $distance) {
unset($markers[$key]);
$cluster[] = $compareMarker;
}
}
// If a marker was added to cluster, also add the marker we were comparing to.
if (count($cluster) > 0) {
$cluster[] = $marker;
$clusterMarkers[] = $cluster;
} else {
$singleMarkers[] = $marker;
}
}
function pixelDistance($lat1, $lon1, $lat2, $lon2, $zoom) {
$x1 = $lon1*10000000; //This is what I did to compensate for using lat/lon values instead of pixels.
$y1 = $lat1*10000000;
$x2 = $lon2*10000000;
$y2 = $lat2*10000000;
return sqrt(pow(($x1-$x2),2) + pow(($y1-$y2),2)) >> (21 - $zoom); //21 is the max zoom level
}
UPDATE
Here's the current code:
$singleMarkers = array();
$clusterMarkers = array();
// Minimum distance between markers to be included in a cluster, at diff. zoom levels
$DISTANCE = (10000000 >> $ZOOM) / 100000;
// Loop until all markers have been compared.
while (count($markers)) {
$marker = array_pop($markers);
$cluster = array();
// Compare against all markers which are left.
foreach ($markers as $key => $target) {
$pixels = abs($marker['lat']-$target['lat']) + abs($marker['lng']-$target['lng']);
// If the two markers are closer than given distance remove target marker from array and add it to cluster.
if ($pixels < $DISTANCE) {
unset($markers[$key]);
$cluster[] = $target;
}
}
// If a marker has been added to cluster, add also the one we were comparing to.
if (count($cluster) > 0) {
$cluster[] = $marker;
$clusterMarkers[] = $cluster;
} else {
$singleMarkers[] = $marker;
}
}
A cluster or map combines the two stages of brainstorming (recording ideas and then grouping them) into one. It also allows you to see, at a glance, the aspects of the subject about which you have the most to say, so it can help you choose how to focus a broad subject for writing.
Each cluster is represented by a circle with a diameter that is directly proportional to the number of markers it represents which, in turn, is also numerically represented. Clicking on a cluster will show you the individual markers or sub-clusters.
k-means is the most widely-used centroid-based clustering algorithm. Centroid-based algorithms are efficient but sensitive to initial conditions and outliers.
Cluster mapping creates a dataset on the presence of clusters across geographies, based on a standardized set of benchmark cluster definitions that group individual industries uniquely into cluster categories.
Do you actually need to calculate the Euclidean distance? If you are just comparing relative magnitudes of distances, you can probably get away with using the Manhattan distance, which will save you two calls to pow()
and one to sqrt()
:
function pixelDistance($lat1, $lon1, $lat2, $lon2, $zoom) {
$x1 = $lon1*10000000; //This is what I did to compensate for using lat/lon values instead of pixels.
$y1 = $lat1*10000000;
$x2 = $lon2*10000000;
$y2 = $lat2*10000000;
return ($x1-$x2) + ($y1-$y2) >> (21 - $zoom);
}
Not sure if you need the >> (21 - $zoom)
bit for your calculations, so I left it in. But unless you actually need to use the calculated distance values elsewhere, you can probably get away with just using the latitude/longitude directly (no need to multiply by anything) and taking the Manhattan distance, assuming you pre-calculate $distance
to fit in with that measure, which will be a lot cheaper in computational terms than coercing all the distances to fit in with the units and magnitude of $distance
.
EDIT: When I was researching this problem last year, I found some useful stuff on Wikipedia - yes, it can happen ;-)
I can also highly recommend the book Programming Collective Intelligence: Building Smart Web 2.0 Applications which goes into clustering in great depth, as applied not only to geographical data but also to other areas of data analysis.
Expanding on what John said, I think you should try inlining that function. Function calls in PHP are very slow, so you should get a decent speedup from that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With