I found this example for using kmeans2 algorithm in python. I can't get the following part
# make some z vlues
z = numpy.sin(xy[:,1]-0.2*xy[:,1])
# whiten them
z = whiten(z)
# let scipy do its magic (k==3 groups)
res, idx = kmeans2(numpy.array(zip(xy[:,0],xy[:,1],z)),3)
The points are zip(xy[:,0],xy[:,1])
, so what is the third value z
doing here?
Also what is whitening?
Any explanation is appreciated. Thanks.
First:
# make some z vlues
z = numpy.sin(xy[:,1]-0.2*xy[:,1])
The weirdest thing about this is that it's equivalent to:
z = numpy.sin(0.8*xy[:, 1])
So I don't know why it's written that way. maybe there's a typo?
Next,
# whiten them
z = whiten(z)
whitening is simply normalizing the variance of the population. See here for a demo:
>>> z = np.sin(.8*xy[:, 1]) # the original z
>>> zw = vq.whiten(z) # save it under a different name
>>> zn = z / z.std() # make another 'normalized' array
>>> map(np.std, [z, zw, zn]) # standard deviations of the three arrays
[0.42645, 1.0, 1.0]
>>> np.allclose(zw, zn) # whitened is the same as normalized
True
It's not obvious to me why it is whitened. Anyway, moving along:
# let scipy do its magic (k==3 groups)
res, idx = kmeans2(numpy.array(zip(xy[:,0],xy[:,1],z)),3)
Let's break that into two parts:
data = np.array(zip(xy[:, 0], xy[:, 1], z))
which is a weird (and slow) way of writing
data = np.column_stack([xy, z])
In any case, you started with two arrays and merge them into one:
>>> xy.shape
(30, 2)
>>> z.shape
(30,)
>>> data.shape
(30, 3)
Then it's data
that is passed to the kmeans algorithm:
res, idx = vq.kmeans2(data, 3)
So now you can see that it's 30 points in 3d space that are passed to the algorithm, and the confusing part is how the set of points were created.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With