Using scipy's kmeans2 function in python

Question

I found this example for using kmeans2 algorithm in python. I can't get the following part

# make some z vlues
z = numpy.sin(xy[:,1]-0.2*xy[:,1])

# whiten them
z = whiten(z)

# let scipy do its magic (k==3 groups)
res, idx = kmeans2(numpy.array(zip(xy[:,0],xy[:,1],z)),3)

The points are zip(xy[:,0],xy[:,1]), so what is the third value z doing here?

Also what is whitening?

Any explanation is appreciated. Thanks.

askewchan · Accepted Answer

First:

# make some z vlues
z = numpy.sin(xy[:,1]-0.2*xy[:,1])

The weirdest thing about this is that it's equivalent to:

z = numpy.sin(0.8*xy[:, 1])

So I don't know why it's written that way. maybe there's a typo?

Next,

# whiten them
z = whiten(z)

whitening is simply normalizing the variance of the population. See here for a demo:

>>> z = np.sin(.8*xy[:, 1])      # the original z
>>> zw = vq.whiten(z)            # save it under a different name
>>> zn = z / z.std()             # make another 'normalized' array
>>> map(np.std, [z, zw, zn])     # standard deviations of the three arrays
[0.42645, 1.0, 1.0]
>>> np.allclose(zw, zn)          # whitened is the same as normalized
True

It's not obvious to me why it is whitened. Anyway, moving along:

# let scipy do its magic (k==3 groups)
res, idx = kmeans2(numpy.array(zip(xy[:,0],xy[:,1],z)),3)

Let's break that into two parts:

data = np.array(zip(xy[:, 0], xy[:, 1], z))

which is a weird (and slow) way of writing

data = np.column_stack([xy, z])

In any case, you started with two arrays and merge them into one:

>>> xy.shape
(30, 2)
>>> z.shape
(30,)
>>> data.shape
(30, 3)

Then it's data that is passed to the kmeans algorithm:

res, idx = vq.kmeans2(data, 3)

So now you can see that it's 30 points in 3d space that are passed to the algorithm, and the confusing part is how the set of points were created.

Using scipy's kmeans2 function in python

Tags:

python

scipy

k-means

kamalbanga

1 Answers

askewchan

Recent Activity

Donate For Us

Using scipy's kmeans2 function in python

Tags:

python

scipy

k-means

kamalbanga

1 Answers

askewchan

Related questions

Recent Activity

Donate For Us