Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing Spark Mllib Bisecting K-means tree data

Looking over the source code for Bisecting K-means it seems that it builds an internal tree representation of the cluster assignments at each level it progresses. Is it possible to get access to that tree? The built-in methods only give the cluster assignment at the leafs and not the nodes.

like image 394
Chankin Avatar asked Jan 20 '17 21:01

Chankin


1 Answers

Follow up on this: has anyone modified the Spark ML source code to be able to store & return the hierarchical clustering tree structure?

I found a GitHub repo with intro to MLlib 1.6's implementation of Bisecting K-means Clustering: https://github.com/yu-iskw/bisecting-kmeans-blog/blob/master/blog-article.md

In the section "What's Next?", the first JIRA ticket [SPARK-11664] "Add methods to get bisecting k-means cluster structure" (https://issues.apache.org/jira/browse/SPARK-11664) seems to be the request to obtain the hierarchical cluster tree structure as a built-in effort. As of today, this ticket status is marked as "resolved".

However, in Spark MLlib's latest implementation (2.4.4) as follows, we didn't find this tree structure, or dendrogram to be a built-in output:

PySpark MLlib 2.4.4 official documentation: https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.clustering.BisectingKMeans https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.clustering.BisectingKMeansModel

Scala MLlib 2.4.4 official documentation: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.clustering.BisectingKMeans https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.clustering.BisectingKMeansModel

We also looked up into their source code, and it does not seem to have the hierarchical tree structure stored as built-in output?

If the hierarchical clustering tree structure is not available in Spark MLlib 2.4.4 BisectingKMeans, does anyone know if there's modified the source code to get the tree structure available?

Thanks!

like image 181
mflowww Avatar answered Oct 18 '22 20:10

mflowww