I have written a function to assign training examples to their closest centroids as part of a K-means clustering algorithm. It seems to me that the dimensions are satisfied and the code runs correctly at times. But frequently, I get the error
In an assignment A(:) = B, the number of elements in A and B must be the same.
for the line
idx(i) = find(dist == value);
Here is the code
function idx = findClosestCentroids(X, centroids)
K = size(centroids, 1);
idx = zeros(size(X,1), 1);
dist = zeros(K, 1);
for i = 1:size(X,1)
for j = 1:K
dist(j) = sum((X(i,:) - centroids(j,:)).^2);
end
value = min(dist);
idx(i) = find(dist == value);
end
What is the problem here?
This is because you are potentially finding more than one cluster that share the same distance to a query point. find
determines all values that satisfy the Boolean condition as the argument. idx(i)
implies that you are assigning a single value to the location of the idx
array but find
may yield more than one value and that gives the assignment error that you are seeing.
Instead use the second output argument of min
which determines the index of first time the smallest value occurs, which is exactly what you want to accomplish:
function idx = findClosestCentroids(X, centroids)
K = size(centroids, 1);
idx = zeros(size(X,1), 1);
dist = zeros(K, 1);
for i = 1:size(X,1)
for j = 1:K
dist(j) = sum((X(i,:) - centroids(j,:)).^2);
end
[~,idx(i)] = min(dist); %// Change
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With