I am currently trying to do some k-means clustering using my data which is stored in my pandas.dataframe (actually in one of its columns). Odd thing is that instead of treating each row as a separate example it threats all rows as one example but in very high dimension. So for example:
df = pd.read_csv('D:\\Apps\\DataSciense\\Kaggle Challenges\\Titanic\\Source Data\\train.csv', header = 0)
median_ages = np.zeros((2,3))
for i in range(0,2):
for j in range (0,3):
median_ages[i, j] =df[(df.Gender == i) &(df.Pclass == j+1)].Age.dropna().median()
df['AgeFill'] = df['Age']
for i in range(0, 2):
for j in range(0,3):
df.loc[ (df.Age.isnull()) & (df.Gender == i) & (df.Pclass == j+1), 'AgeFill'] = median_ages[i, j]
then I just check that it looks fine:
df.AgeFill
Name: AgeFill, Length: 891, dtype: float64
Looks ok, 891 float64 number. I do custering:
k_means = cluster.KMeans(n_clusters=1, init='random')
k_means.fit(df.AgeFill)
And I check for cluster centers:
k_means.cluster_centers_
It returns me one giant array.
Furthermore:
k_means.labels_
Gives me:
array([0])
What am I doing wrong? Why it thinks I have a one example with 891 dimensions, instead of having 891 example?
Just to illustrate it better, if I try 2 clusters:
k_means = cluster.KMeans(n_clusters=2, init='random')
k_means.fit(df.AgeFill)
Traceback (most recent call last): File "", line 1, in k_means.fit(df.AgeFill) File "D:\Apps\Python\lib\site-packages\sklearn\cluster\k_means_.py", line 724, in fit X = self._check_fit_data(X) File "D:\Apps\Python\lib\site-packages\sklearn\cluster\k_means_.py", line 693, in _check_fit_data X.shape[0], self.n_clusters)) ValueError: n_samples=1 should be >= n_clusters=2
So you could see that it REALLY thinks that it is just one giant sample.
But:
df.AgeFill.shape
(891,)
You are passing a 1D array while scikit expects a 2D array with a samples and a features axis. This should do it:
k_means.fit(df.AgeFill.reshape(-1, 1))
Before:
>>> df.AgeFill.shape
(891,)
After:
>>> df.AgeFill.reshape(-1, 1).shape
(891, 1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With