Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Blaze with Scikit Learn K-Means

I am trying to fit Blaze data object to scikit kmeans function.

from blaze import *
from sklearn.cluster import KMeans
data_numeric = Data('data.csv')
data_cluster = KMeans(n_clusters=5)
data_cluster.fit(data_numeric)

Data Sample:

A  B  C
1  32 34
5  57 92
89 67 21

Its throwing error :

enter image description here

I have been able to do it with Pandas Dataframe. Any way to feed blaze object to this function ?

like image 583
sachin saxena Avatar asked Sep 29 '16 08:09

sachin saxena


2 Answers

I think you need to convert your pandas dataframe into an numpy array before you fit.

from blaze import *
import numpy

from sklearn.cluster import KMeans
data_numeric = numpy.array(data('data.csv'))
data_cluster = KMeans(n_clusters=5)
data_cluster.fit(data_numeric)
like image 125
aberger Avatar answered Oct 05 '22 11:10

aberger


sklearn.cluster.KMeans don't support input data with type blaze.interactive._Data which is the type of data_numeric in your code.

You can use data_cluster.fit(data_numeric.peek()) to fit the transferred data_numeric with type DataFrame supported by sklearn.cluster.KMeans.

like image 41
yhuang Avatar answered Oct 05 '22 12:10

yhuang