Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What would be the best k for this kmeans clustering? (Elbow point plot)

I am trying kmeans to find the optimal place to start a coffee shop near subway station in Seoul.

Included features are:

  1. Total monthly alights on a particular station
  2. Rental Fees near a particular station
  3. Number of existing coffee shops near a particular station

I decided to use elbow point to find the best k. I did standardize all the features before running kmeans.

enter image description here

Now the elbow point seems to be k=3(or maybe k=2), but I think the SSE is too high for an elbow point.

Also using k=3, it was difficult to gain insights from the clusters because there were only three of them.

Using k=5 was the sweet spot to gain insights.

Can using k=5 be justified even if it's not an elbow point?

Or is kmeans not a good option in the first place?

like image 812
Matt Yoon Avatar asked Dec 06 '22 09:12

Matt Yoon


1 Answers

The elbow-point is not a definitive rule but is more of a heuristic method (it works most of the time but not always, so I see it more like is a good rule-of-thumb for choosing a number of clusters to start from). On top of that, the elbow-point cannot always be unambiguously identified so you shouldn't worry too much about it.

So in that case, if you get better results/gain in how you understand your data using k=5, then I would highly suggest you to use k=5 rather than k=3!

Now, for your other question, there may be approaches that would better suit your data but it doesn't mean k-means isn't a good way to start. If you want to try other things, the scikit-learn library documentation provides good insights on which algorithm or method to use when doing clustering.

like image 173
bglbrt Avatar answered Dec 08 '22 00:12

bglbrt