Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Scikit-learn KMeans to cluster multi-dimensional arrays

I've searched some tutorials regarding KMeans with Scikit-learn, but I wasn't able to find something specific to my case.

I have an array of multiple objects that has a format of

{
    name: 'Bob',
    vector: [14,12,15,10,16,16,7,15,7,4,16,13,4,16,13,17,13,13,10,8,14,17,10,16,6,14,16,13,15,17,12,7,14,13,15,10]
}

So, I have an array of these objects [ {...}, {...}, ... ]

I wanted to use the vector field of these objects to use the KMeans to get clusters of similar items.

The vector values are normalized to range between 1 and 20.

Any help would be great. Thanks.

like image 541
Dawn17 Avatar asked Oct 25 '25 16:10

Dawn17


1 Answers

import lib

from sklearn.cluster import KMeans
import numpy as np

format your array of these objects for Scikit-learn's KMeans to work

data_for_clustering = [row['vector'] for row in data]
data_for_clustering = np.array(data_for_clustering)

do clustering

kmeans = KMeans(n_clusters=2, random_state=0).fit(data_for_clustering)

get labels

kmeans.labels_
like image 166
Jingles Avatar answered Oct 27 '25 04:10

Jingles