I can calculate cluster membership with KMeans
pretty easily:
open System
open System.IO
open Utils
open Accord
open Accord.Math
open Accord.MachineLearning
let vals = [|
[|1.0; 2.0; 3.0; 2.0|]
[|1.1; 1.9; 3.1; 4.0|]
[|2.0; 3.0; 4.0; 4.0|]
[|3.0; 3.1; 2.0; 3.0|]
[|2.0; 4.0; 3.0; 6.0|]
[|1.0; 5.0; 5.0; 7.0|]
[|4.0; 3.0; 6.0; 8.0|]
[|5.0; 4.0; 3.0; 6.0|]
[|6.0; 4.0; 8.0; 7.0|]
[|5.0; 6.0; 5.0; 9.0|]
[|4.0; 2.0; 7.0; 8.0|]
[|8.0; 9.0; 3.1; 2.2|]
[|8.0; 9.0; 2.0; 2.0|]
[|10.0; 2.0; 3.0; 2.0|]
[|10.1; 1.9; 3.1; 4.0|]
[|20.0; 3.0; 4.0; 4.0|]
[|22.0; 7.0; 2.0; 3.0|]
[|21.0; 4.0; 3.0; 6.0|]
|]
let kmeans = new KMeans 5
let clusterModel = kmeans.Learn vals
let clusters = clusterModel.Decide vals
Can I calculate partial membership with the standard KMeans
algorithm? A coworker suggested using the mean and variances of the cluster members to determine proportional membership and today I've been looking into fuzzy sets and their implementations for F#
. For example, here is some documentation for the Accord.net implementation for fuzzy sets. I can translate/run the example for F#
but at first glance, I don't see an easy way to get the data from my Kmeans
run above to fit the format of assigning partial membership.
Questions:
How would I use the mean/variance of cluster members to calculate partial membership?
Is there an easy way to calculate partial membership with KMeans
clustering with the Accord.net library?
The KMeans algorithm in Accord.net is simple to implement; should I spend some time trying to learn this method of clustering/membership to suit my problem rather than try and force Kmeans clustering to suit my needs?
As mentioned by Tomas, Accord.NET already gives you many of the building blocks. In particular, calling clusterModel.Scores
gives you the (negative) distances to the cluster centroids, see source code
From the negative distances, you can compute an approximate class membership score by taking exponentials, similar to what you would do to compute a Gaussian PDF. In F#, that would look like:
// Scores returns the negative distances between each point
// and the cluster centroid
let negDistances = clusterModel.Scores vals
// Compute an estimated cluster assigment score
let clusterMembership =
negDistances
|> Array.map (fun distances ->
// Take the Exponential of the (negative) distances,
// as in computing a Gaussian pdf
let expDist = distances |> Array.map Math.Exp
let total = Array.sum expDist
expDist
|> Array.map (fun d -> d/total)
)
There are a couple of caveats here:
Regarding your third question: I would not re-implement. It may seem easy initially, but there are usually plenty of corner cases and stability issues that you only run into after some time.
You should be able to use Accord.NET to get the "centroids" of the clusters that the K-means algorithm finds. Those are essentially the centres of the individual clusters. You should then be able to calculate the distance between your new data point and each of the centroids to see which of the centroids are close to your point. (The Decide
method returns just the first one.)
I have not tried this, but it seems that KMeans
exposes Clusters
, which is a KMeansClusterCollection
and has the Centroids
property (see the docs). It also exposes the Distance
property which returns the function for calculating distance between the data points.
Using these, you should be able to compare the distance of your data point with the centroids of all the clusters and decide how close the point is to individual clusters.
Implementing k-means from scratch is not that hard (there is a nice post from Mathias Brandewinder on this), but it seems that Accord.NET exposes all the information that you need in this particular case - so perhaps that's all you need (getting all the details right in custom implementation is always the hardest part...).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With