Server-side clustering for google maps api v3

Tags:

I am currently developing a kind of google maps overview widget that displays locations as markers on the map. The amount of markers varies from several hundreds up to thousands of markers (10000 up). Right now I am using MarkerClusterer for google maps v3 1.0 and the google maps javascript api v3 (premier) and it works pretty decent for lets say a hundred markers. Due to the fact that the number of markers will increase I need a new way of clustering the markers. From what I read the only way to keep the performance up is moving the clustering from the client-side to the server-side. Does anyone know a good PHP5 library which is able to get this done for me?

Atm I am digging deeper into the layer mechanisms of google maps. Maybe there are also a few leading PHP librarys I could start to check out? I also ran across FusionTables but since I need clustering I think this might not be the right solution.

Thanks in advance!

226

asked Sep 23 '11 12:09

mayrs

1 Answers

I don't know of a server-side library that'll do the job for you. I can however give you some pointers on how to implement one yourself.

The basic approach to clustering is simply to calculate the distance between your markers and when two of them are close enough you replace them with a single marker located at the mid-point between the two.

Instead of just having a limitation on how close to each other markers may be, you may also (or instead) choose to limit the number of clusters/markers you want as a result.

To accomplish this you could calculate the distance between all pairs of markers, sort them, and then merge from the top until you only have as many markers/clusters as you wish.

To refine the mid-point positioning when forming a cluster you may take into account the number of actual markers represented by each of the two to be merged. Think of that number as a weight and the line between the two markers as a scale. Then instead of always choosing the mid-point, choose the point that would balance the scale.

I'd guess that this simple form of clustering is good enough if you have a limited number of markers. If your data set (# of markers and their position) is roughly static you can calculate clustering on the server once in a while, cache it, and server clients directly from the cache.

However, if you need to support large scale scenarios potentially with markers all over the world you'll need a more sophisticated approach.

The mentioned cluster algorithm does not scale. In fact its computation cost would typically grow exponentially with the number of markers.

To remedy this you could split the world into partitions and calculate clustering and serve clients from each partition. This would indeed support scaling since the workload can be split and performed by several (roughly) independent servers.

The question then is how to find a good partitioning scheme. You may also want to consider providing different clustering of markers at different zoom levels, and your partitioning scheme should incorporate this as well to allow scaling.

Google divide the map into tiles with x, y and z-coordinates, where x and y are the horizontal and vertical position of the tile starting from the north-west corner of the map, and where z is the zoom level.

At the minimum zoom level (zero) the entire map consist of a single tile. (all tiles are 256x256 pixels). At the next zoom level that tile is divided into four sub tiles. This continues, so that in zoom level 2 each of those four tiles has been divided into four sub tiles, which gives us a total of 16 tiles. Zoom level 3 has 64 tiles, level 4 has 256 tiles, and so on. (The number of tiles on any zoom level can be expressed as 4^z.)

Using this partitioning scheme you could calculate clustering per tile starting at the lowest zoom level (highest z-coordinate), bubbling up until you reach the top.

The set of markers to be clustered for a single tile is the union of all markers (some of which may represent clusters) of its four sub tiles.

This gives you a limited computational cost and also gives you a nice way of chunking up the data to be sent to the client. Instead of requesting all markers for a given zoom level (which would not scale) clients can request markers on a tile-by-tile basis as they are loaded into the map.

There is however a flaw in this approach: Consider two adjacent tiles, one to the left and one to the right. If the left tile contains a marker/cluster at its far right side and the right tile contains a marker/cluster at its far left side, then those two markers/clusters should be merged but won't be since we're performing the clustering mechanism for each tile individually.

To remedy this you could post-process tiles after they have been clustered so that you merge markers/clusters that lay on the each of the four edges, taking into account each of the eight adjacent tiles for a given tile. This post-merging mechanism will only work if we can assume that no single cluster is large enough to affect the surrounding markers which are not in the same sub tile. This is, however, a reasonable assumption.

As a final note: With the scaled out approach you'll have clients making several small requests. These requests will have locality (i.e. tiles are not randomly requested, but instead tiles that are geographically close to each other are also typically accessed together).

To improve lookup/query performance you would benefit from using search keys (representing the tiles) that also have this locality property (since this would store data for adjacent tiles in adjacent data blocks on disk - improving read time and cache utilization).

You can form such a key using the tile/sub tile partitioning scheme. Let the top tile (the single one spanning the entire map) have the empty string as key. Next, let each of its sub tiles have the keys A, B, C and D. The next level would have keys AA, AB, AC, AD, BA, BC, ..., DC, DD.

Apply this recursively and you'll end up with a partitioning key that identifies your tiles, allows quick transformation to x,y,z-coordinates and has the locality property. This key naming scheme is sometimes called a Quad Key stemming from the fact that the partitioning scheme forms a Quad Tree. The locality property is the same as you get when using a Z-order curve to map a 2D-value into a 1D-value.

Please let me know if you need more details.

answered Sep 28 '22 22:09

Mårten Wikström

Related questions
                            
                                setlocale/strftime issue
                            
                                A way to make md5_file() faster?
                            
                                How to populate zend form select element?
                            
                                MVC: Where should I format data?
                            
                                Extremely Large Integers in PHP [duplicate]
                            
                                Debian based systems Session killed at 30 minutes in special cron, how to override?
                            
                                php curl with CURLOPT_FOLLOWLOCATION error
                            
                                PHPUnit configuration (phpunit.xml) -- loading in a bootstrap?
                            
                                How do I protect phone number from bots
                            
                                Cannot figure out how to run a mysqli_multi_query and use the results from the last query
                            
                                array_walk or array_map?
                            
                                'Length required', when posting data with cURL
                            
                                What is Java's 'self' keyword
                            
                                E-commerce from scratch or not
                            
                                Google Maps API V3 Usage Limits is Per Website Visitor or Per Web Server?
                            
                                do while loop causing 100% CPU usage with curl_multi_exec
                            
                                Tidy replacing &nbsp; with a weird character
                            
                                Codeigniter (CSRF) jQuery ajax problem
                            
                                Why do we need to escape the ! < > : = - in php regular expressions?
                            
                                Writing unit tests for a REST-ful API [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Server-side clustering for google maps api v3

Tags:

php

google-maps-api-3

google-maps-markers

server-side

cluster-analysis

mayrs

People also ask

1 Answers

Mårten Wikström

Recent Activity

Donate For Us