How can I fix a MemoryError when executing scikit-learns silhouette score?

Question

I run a clustering algorithm and want to evaluate the result by using silhouette score in scikit-learn. But in the scikit-learn, it needs to calculate the distance matrix: distances = pairwise_distances(X, metric=metric, **kwds)

Due to the fact that my data is order of 300K, and my memory is 2GB, and the result is out of memory. And I can not evaluate the clustering result.

Does anyone know how to overcome this problem?

mwv · Accepted Answer

Set the sample_size parameter in the call to silhouette_score to some value smaller than 300K. Using this parameter will sample datapoints from X and calculate the silhouette_score on those instead of the entire array.

How can I fix a MemoryError when executing scikit-learns silhouette score?

Tags:

memory

machine-learning

cluster-analysis

scikit-learn

Thien Bao

1 Answers

mwv

Recent Activity

Donate For Us

How can I fix a MemoryError when executing scikit-learns silhouette score?

Tags:

memory

machine-learning

cluster-analysis

scikit-learn

Thien Bao

1 Answers

mwv

Related questions

Recent Activity

Donate For Us