Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Geospatial marker clustering with elasticsearch

I have several hundred thousand documents in an elasticsearch index with associated latitudes and longitudes (stored as geo_point types). I would like to be able to create a map visualization that looks something like this: http://leaflet.github.io/Leaflet.markercluster/example/marker-clustering-realworld.388.html

So, I think what I want is to run a query with a bounding box (i.e., the map boundaries that the user is looking at) and return a summary of the clusters within this bounding box. Is there a good way to accomplish this in elasticsearch? A new indexing strategy perhaps? Something like geohashes could work, but it would cluster things into a rectangular grid, rather than the arbitrary polygons based on point density as seen in the example above.


@kumetix - Good question. I'm responding to your comment here because the text was too long to put in another comment. The geohash_precision setting will dictate the maximum precision at which a geohash aggregation will be able to return. For example, if geohash_precision is set to 8, we can run a geohash aggregation on that field with at most precision 8. This would, according to the reference, return results grouped in geohash boxes of roughly 38.2m x 19m. A precision of 7 or 8 would probably be accurate enough for showing a web-based heatmap like the one I mentioned in the above example.

As far as how geohash_precision affects the cluster internals, I'm guessing the setting stores a geohash string of length <= geohash_precision inside the geo_point. Let's say we have a point at the Statue of Liberty: 40.6892,-74.0444. The geohash12 for this is: dr5r7p4xb2ts. Setting geohash_precision in the geo_point to 8 would internally store the strings: d dr dr5 dr5r dr5r7 dr5r7p dr5r7p4 dr5r7p4x

and a geohash_precision of 12 would additionally internally store the strings: dr5r7p4xb dr5r7p4xb2 dr5r7p4xb2t dr5r7p4xb2ts

resulting in a little more storage overhead for each geo_point. Setting the geohash_precision to a distance value (1km, 1m, etc) probably just stores it at the closest geohash string length precision value.

Note: How to calculate geohashes using python

$ pip install python-geohash
>>> import geohash
>>> geohash.encode(40.6892,-74.0444)
'dr5r7p4xb2ts'
like image 432
Dan Noble Avatar asked Oct 04 '22 11:10

Dan Noble


2 Answers

In Elasticsearch 1.0, you can use the new Geohash Grid aggregation.

Something like geohashes could work, but it would cluster things into a rectangular grid, rather than the arbitrary polygons based on point density as seen in the example above.

This is true, but the geohash grid aggregation handles sparse data well, so all you need is enough points on your grid and you can achieve something pretty similar to the example in that map.

like image 188
DrTech Avatar answered Oct 07 '22 00:10

DrTech


Try this:

https://github.com/triforkams/geohash-facet

We have been using it to do server-side clustering and it's pretty good.

Example query:

GET /things/thing/_search
{
  "size": 0,
  "query": {
        "filtered": {
            "filter": {
                "geo_bounding_box": {
                    "Location"
                    : {
                        "top_left": {
                            "lat": 45.274886437048941,
                            "lon": -34.453125
                        },
                        "bottom_right": {
                            "lat": -35.317366329237856,
                            "lon": 1.845703125
                        }
                    }
                }
            }
        }
    },
    "facets": {
      "places": {
        "geohash": {
          "field": "Location",
          "factor": 0.85
        }
      }

    }
}
like image 31
sf. Avatar answered Oct 07 '22 00:10

sf.