I am trying kmeans to find the optimal place to start a coffee shop near subway station in Seoul.
Included features are:
I decided to use elbow point to find the best k. I did standardize all the features before running kmeans.
Now the elbow point seems to be k=3(or maybe k=2), but I think the SSE is too high for an elbow point.
Also using k=3, it was difficult to gain insights from the clusters because there were only three of them.
Using k=5 was the sweet spot to gain insights.
Can using k=5 be justified even if it's not an elbow point?
Or is kmeans not a good option in the first place?
The elbow-point is not a definitive rule but is more of a heuristic method (it works most of the time but not always, so I see it more like is a good rule-of-thumb for choosing a number of clusters to start from). On top of that, the elbow-point cannot always be unambiguously identified so you shouldn't worry too much about it.
So in that case, if you get better results/gain in how you understand your data using k=5
, then I would highly suggest you to use k=5
rather than k=3
!
Now, for your other question, there may be approaches that would better suit your data but it doesn't mean k-means isn't a good way to start. If you want to try other things, the scikit-learn
library documentation provides good insights on which algorithm or method to use when doing clustering.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With