Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Installing the Kmeans PostgreSQL extension on Amazon RDS

I take part in some Django poroject and we use geo data (with GeoDjango). I have installed PostGis as it described on AWS docs.

We have a lot of some points (markers) on the map. And we need to cluster them.

I found one library anycluster. This library need the PostgreSQL extension named kmeans-postgresql be installed on the Postgre database.

But my database is located on Amazon RDS. And I can't connect to it by SSH in order to install an extension...

Anybody knows how can I install kmeans-postgresql extension on my Amazon RDS database?

Or maybe you can advise me other ways of clustering?

like image 720
Anton Avatar asked Jun 10 '15 19:06

Anton


1 Answers

The K-Means It is a really complex calculation that is useful to data mining and cluster analysis ( you can see more about it in the wikipedia page https://en.wikipedia.org/wiki/K-means_clustering ). It have a big complexity when have to deal with many points. The K-means extension to postgresql http://pgxn.org/dist/kmeans/doc/kmeans.html it is written in C and compiled in the database machine. This brings a better performance compared to an procedure in plpgsql. Unfortunately as @estevao_lucas answered, this extension it is not enabled into Amazon RDS.

If you really need the k-means result, I translated this implementation of it, created by Joni Salonen in http://jonisalonen.com/2012/k-means-clustering-in-mysql/ and changed to plpgsql https://gist.github.com/thiagomata/a9737c3455d6248bef9f. This function uses temporary table. It is possible change it to use only arrays of Pins, if you wanna to.

But, if you only need to show some pins in a map, you will probably be happy with a really faster and simpler function that groups the results into an [x,y] matrix. I have created such function because the kmeans function was taking too much time to process my database (with a lot more than 400K elements). So this implementation is really faster, but does not have all the features you would expect from the K-means module. Besides that, this grid function https://gist.github.com/thiagomata/18ea14853998468c1a1d returns very good results, when the goal it is to show a big number of pins in a map. Example of Grid Result

like image 65
Thiago Mata Avatar answered Sep 25 '22 07:09

Thiago Mata