Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostGis clustering with other aggregate

I would like to compute cluster of points and, for each cluster to get the sum of a specific attribute (let say, the sum of the score of each point in the cluster)

I already managed to build clusters using ST_ClusterWithin but I cannot compute the sum.

Here is what I tried :

SELECT sum(score), unnest(ST_ClusterWithin(coordinates, 0.1)) AS cluster
FROM locations
GROUP BY cluster;

But I get the following error ERROR: aggregate functions are not allowed in GROUP BY

If I remove the GROUP BY, I get the sum of the score for all locations, which is not what I want (I want the sum for locations in the cluster)

like image 738
Chris Avatar asked Feb 18 '16 13:02

Chris


2 Answers

This is a tricky one and the st_clusterwithin api doesn't seem well designed for what should be a common case.

The only solution I could find was to rejoin back on the clusters as follows:

SELECT SUM(score), cluster FROM locations, (
    SELECT unnest(ST_ClusterWithin(coordinates, 0.1)) AS cluster
    FROM locations
) as location_clustered
WHERE ST_Contains(ST_CollectionExtract(cluster, 1), coordinates)
GROUP BY cluster;

Edit: I've changed ST_CollectionHomogenize to ST_CollectionExtract(<geometrycollection>, 1) (Pick 1 for point, 2 for linestring and 3 for polygon) as suggested in this answer: https://gis.stackexchange.com/questions/195915/ because of this bug: https://trac.osgeo.org/postgis/ticket/3569

Don't ask me why you can't do ST_Contains(<geometrycollection>, <geometry>); We need to convert to a multipoint which is permissable as an argument.

Meta: this question would have been a great match for https://gis.stackexchange.com/

like image 62
EoghanM Avatar answered Oct 18 '22 01:10

EoghanM


With PostGIS 2.3, one might profit from the ST_ClusterDBSCAN function (the choice of the third parameter reduces it to hierarchical clustering) which returns directly the corresponding cluster index:

WITH stat AS (
  SELECT
    score, ST_ClusterDBSCAN(coordinates, 0.1, 1) OVER () AS cluster_id
  FROM
    tmp_locations
)
SELECT
  cluster_id, SUM(score)
FROM
  stat
GROUP BY
  cluster_id
ORDER BY
  cluster_id
like image 2
ewcz Avatar answered Oct 18 '22 01:10

ewcz