I run a clustering algorithm and want to evaluate the result by using silhouette score in scikit-learn. But in the scikit-learn, it needs to calculate the distance matrix: distances = pairwise_distances(X, metric=metric, **kwds)
Due to the fact that my data is order of 300K, and my memory is 2GB, and the result is out of memory. And I can not evaluate the clustering result.
Does anyone know how to overcome this problem?
Set the sample_size
parameter in the call to silhouette_score
to some value smaller than 300K. Using this parameter will sample datapoints from X
and calculate the silhouette_score
on those instead of the entire array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With