Often you are given a vector of integer values representing your labels (aka classes), for example
[2; 1; 3; 3; 2]
and you would like to hot one encode this vector, such that each value is represented by a 1 in the column indicated by the value in each row of the labels vector, for example
[0 1 0;
1 0 0;
0 0 1;
0 0 1;
0 1 0]
For speed and memory savings, you can use bsxfun
combined with eq
to accomplish the same thing. While your eye
solution may work, your memory usage grows quadratically with the number of unique values in X
.
Y = bsxfun(@eq, X(:), 1:max(X));
Or as an anonymous function if you prefer:
hotone = @(X)bsxfun(@eq, X(:), 1:max(X));
Or if you're on Octave (or MATLAB version R2016b and later) , you can take advantage of automatic broadcasting and simply do the following as suggested by @Tasos.
Y = X == 1:max(X);
Here is a quick benchmark showing the performance of the various answers with varying number of elements on X
and varying number of unique values in X
.
function benchit()
nUnique = round(linspace(10, 1000, 10));
nElements = round(linspace(10, 1000, 12));
times1 = zeros(numel(nUnique), numel(nElements));
times2 = zeros(numel(nUnique), numel(nElements));
times3 = zeros(numel(nUnique), numel(nElements));
times4 = zeros(numel(nUnique), numel(nElements));
times5 = zeros(numel(nUnique), numel(nElements));
for m = 1:numel(nUnique)
for n = 1:numel(nElements)
X = randi(nUnique(m), nElements(n), 1);
times1(m,n) = timeit(@()bsxfunApproach(X));
X = randi(nUnique(m), nElements(n), 1);
times2(m,n) = timeit(@()eyeApproach(X));
X = randi(nUnique(m), nElements(n), 1);
times3(m,n) = timeit(@()sub2indApproach(X));
X = randi(nUnique(m), nElements(n), 1);
times4(m,n) = timeit(@()sparseApproach(X));
X = randi(nUnique(m), nElements(n), 1);
times5(m,n) = timeit(@()sparseFullApproach(X));
end
end
colors = get(0, 'defaultaxescolororder');
figure;
surf(nElements, nUnique, times1 * 1000, 'FaceColor', colors(1,:), 'FaceAlpha', 0.5);
hold on
surf(nElements, nUnique, times2 * 1000, 'FaceColor', colors(2,:), 'FaceAlpha', 0.5);
surf(nElements, nUnique, times3 * 1000, 'FaceColor', colors(3,:), 'FaceAlpha', 0.5);
surf(nElements, nUnique, times4 * 1000, 'FaceColor', colors(4,:), 'FaceAlpha', 0.5);
surf(nElements, nUnique, times5 * 1000, 'FaceColor', colors(5,:), 'FaceAlpha', 0.5);
view([46.1000 34.8000])
grid on
xlabel('Elements')
ylabel('Unique Values')
zlabel('Execution Time (ms)')
legend({'bsxfun', 'eye', 'sub2ind', 'sparse', 'full(sparse)'}, 'Location', 'Northwest')
end
function Y = bsxfunApproach(X)
Y = bsxfun(@eq, X(:), 1:max(X));
end
function Y = eyeApproach(X)
tmp = eye(max(X));
Y = tmp(X, :);
end
function Y = sub2indApproach(X)
LinearIndices = sub2ind([length(X),max(X)], [1:length(X)]', X);
Y = zeros(length(X), max(X));
Y(LinearIndices) = 1;
end
function Y = sparseApproach(X)
Y = sparse(1:numel(X), X,1);
end
function Y = sparseFullApproach(X)
Y = full(sparse(1:numel(X), X,1));
end
If you need a non-sparse output bsxfun
performs the best, but if you can use a sparse
matrix (without conversion to a full matrix), then that is the fastest and most memory efficient option.
I think this is fast specially when matrix dimension grows:
Y = sparse(1:numel(X), X,1);
or
Y = full(sparse(1:numel(X), X,1));
You can use the identity matrix and index into it using the input/labels vector, for example if the labels vector X is some random integer vector
X = randi(3,5,1)
ans =
2
1
2
3
3
then, the following will hot one encode X
eye(max(X))(X,:)
which can be conveniently defined as a function using
hotone = @(v) eye(max(v))(v,:)
EDIT:
Although the solution above works in Octave, you have you modify it for Matlab as follows
I = eye(max(X));
I(X,:)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With