'KMeansModel' object has no attribute 'computeCost' in apache pyspark

Question

I'm experimenting with a clustering model in pyspark. I'm trying to get the mean squared cost of the cluster fit for different values of K

def meanScore(k,df):
  inputCol = df.columns[:38]
  assembler = VectorAssembler(inputCols=inputCols,outputCol="features")
  kmeans = KMeans().setK(k)
  pipeModel2 = Pipeline(stages=[assembler,kmeans])
  kmeansModel = pipeModel2.fit(df).stages[-1]
  kmeansModel.computeCost(assembler.transform(df))/data.count()

When I try to call this function to compute costs for different values of K in the dataframe

for k in range(20,100,20):
  sc = meanScore(k,numericOnly)
  print((k,sc))

I receive an attribute error as AttributeError: 'KMeansModel' object has no attribute 'computeCost'

I'm fairly new to pyspark and am just learning, I sincerely appreciate any help with this. Thanks

Dhouibi iheb · Accepted Answer

As Erkan sirin mentioned computeCost is deprecated in recent version this may help you solve your problem

# Make predictions 
predictions = model.transform(dataset)
from pyspark.ml.evaluation import ClusteringEvaluator
# Evaluate clustering by computing Silhouette score
evaluator = ClusteringEvaluator()
silhouette = evaluator.evaluate(predictions)
print("Silhouette with squared euclidean distance = " + str(silhouette))

I hope this helps, you can check official docs for more informations

'KMeansModel' object has no attribute 'computeCost' in apache pyspark

Tags:

python

cluster-analysis

k-means

apache-spark

pyspark

kausik sivakumar

1 Answers

Dhouibi iheb

Recent Activity

Donate For Us

'KMeansModel' object has no attribute 'computeCost' in apache pyspark

Tags:

python

cluster-analysis

k-means

apache-spark

pyspark

kausik sivakumar

1 Answers

Dhouibi iheb

Related questions

Recent Activity

Donate For Us