Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scikit-learn: How to run KMeans on a one-dimensional array?

I have an array of 13.876(13,876) values between 0 and 1. I would like to apply sklearn.cluster.KMeans to only this vector to find the different clusters in which the values are grouped. However, it seems KMeans works with a multidimensional array and not with one-dimensional ones. I guess there is a trick to make it work but I don't know how. I saw that KMeans.fit() accepts "X : array-like or sparse matrix, shape=(n_samples, n_features)", but it wants the n_samples to be bigger than one

I tried putting my array on a np.zeros() matrix and run KMeans, but then is putting all the non-null values on class 1 and the rest on class 0.

Can anyone help in running this algorithm on a one-dimensional array?

like image 201
Irene Avatar asked Feb 09 '15 18:02

Irene


1 Answers

You have many samples of 1 feature, so you can reshape the array to (13,876, 1) using numpy's reshape:

from sklearn.cluster import KMeans import numpy as np x = np.random.random(13876)  km = KMeans() km.fit(x.reshape(-1,1))  # -1 will be calculated to be 13876 here 
like image 156
ryanpattison Avatar answered Sep 20 '22 00:09

ryanpattison