guys. I am yet a beginner trying to learn ML so do forgive me for such a simple question. I had a dataset from UCI ML Repository. So, started applying all kinds of unsupervised algorithm in which i also applied K Means Cluster algorithm. When I printed out the accuracy score it was negative, not just once but many times. As far as I know scores aren't negative. So could you please help me as to why it's negative.
Any help is appreciated.
import pandas as pd
import numpy as np
a = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data', names = ["a", "b", "c", "d","e","f","g","h","i"])
b = a
c = b.filter(a.columns[[8]], axis=1)
a.drop(a.columns[[8]], axis=1, inplace=True)
from sklearn.preprocessing import LabelEncoder
le1 = LabelEncoder()
le1.fit(a.a)
a.a = le1.transform(a.a)
from sklearn.preprocessing import OneHotEncoder
x = np.array(a)
y = np.array(c)
ohe = OneHotEncoder(categorical_features=[0])
ohe.fit(x)
x = ohe.transform(x).toarray()
from sklearn.model_selection import train_test_split
xtr, xts, ytr, yts = train_test_split(x,y,test_size=0.2)
from sklearn import cluster
kmean = cluster.KMeans(n_clusters=2, init='k-means++', max_iter=100, n_init=10)
kmean.fit(xtr,ytr)
print(kmean.score(xts,yts))
Thank you!!
The k-means score is an indication of how far the points are from the centroids. In scikit learn, the score is better the closer to zero it is.
Bad scores will return a large negative number, whereas good scores return close to zero. Generally, you will want to take the absolute value of the output from the scores method for better visualization.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With