Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I hot one encode in Matlab? [duplicate]

Often you are given a vector of integer values representing your labels (aka classes), for example

[2; 1; 3; 3; 2]

and you would like to hot one encode this vector, such that each value is represented by a 1 in the column indicated by the value in each row of the labels vector, for example

[0 1 0;
 1 0 0;
 0 0 1;
 0 0 1;
 0 1 0]
like image 932
osipov Avatar asked Aug 15 '16 00:08

osipov


3 Answers

For speed and memory savings, you can use bsxfun combined with eq to accomplish the same thing. While your eye solution may work, your memory usage grows quadratically with the number of unique values in X.

Y = bsxfun(@eq, X(:), 1:max(X));

Or as an anonymous function if you prefer:

hotone = @(X)bsxfun(@eq, X(:), 1:max(X));

Or if you're on Octave (or MATLAB version R2016b and later) , you can take advantage of automatic broadcasting and simply do the following as suggested by @Tasos.

Y = X == 1:max(X);

Benchmark

Here is a quick benchmark showing the performance of the various answers with varying number of elements on X and varying number of unique values in X.

function benchit()

    nUnique = round(linspace(10, 1000, 10));
    nElements = round(linspace(10, 1000, 12));

    times1 = zeros(numel(nUnique), numel(nElements));
    times2 = zeros(numel(nUnique), numel(nElements));
    times3 = zeros(numel(nUnique), numel(nElements));
    times4 = zeros(numel(nUnique), numel(nElements));
    times5 = zeros(numel(nUnique), numel(nElements));

    for m = 1:numel(nUnique)
        for n = 1:numel(nElements)
            X = randi(nUnique(m), nElements(n), 1);
            times1(m,n) = timeit(@()bsxfunApproach(X));

            X = randi(nUnique(m), nElements(n), 1);
            times2(m,n) = timeit(@()eyeApproach(X));

            X = randi(nUnique(m), nElements(n), 1);
            times3(m,n) = timeit(@()sub2indApproach(X));

            X = randi(nUnique(m), nElements(n), 1);
            times4(m,n) = timeit(@()sparseApproach(X));

            X = randi(nUnique(m), nElements(n), 1);
            times5(m,n) = timeit(@()sparseFullApproach(X));
        end
    end

    colors = get(0, 'defaultaxescolororder');

    figure;

    surf(nElements, nUnique, times1 * 1000, 'FaceColor', colors(1,:), 'FaceAlpha', 0.5);
    hold on
    surf(nElements, nUnique, times2 * 1000, 'FaceColor', colors(2,:), 'FaceAlpha', 0.5);
    surf(nElements, nUnique, times3 * 1000, 'FaceColor', colors(3,:), 'FaceAlpha', 0.5);
    surf(nElements, nUnique, times4 * 1000, 'FaceColor', colors(4,:), 'FaceAlpha', 0.5);
    surf(nElements, nUnique, times5 * 1000, 'FaceColor', colors(5,:), 'FaceAlpha', 0.5);

    view([46.1000   34.8000])

    grid on
    xlabel('Elements')
    ylabel('Unique Values')
    zlabel('Execution Time (ms)')

    legend({'bsxfun', 'eye', 'sub2ind', 'sparse', 'full(sparse)'}, 'Location', 'Northwest')
end

function Y = bsxfunApproach(X)
    Y = bsxfun(@eq, X(:), 1:max(X));
end

function Y = eyeApproach(X)
    tmp = eye(max(X));
    Y = tmp(X, :);
end

function Y = sub2indApproach(X)
    LinearIndices = sub2ind([length(X),max(X)], [1:length(X)]', X);
    Y = zeros(length(X), max(X));
    Y(LinearIndices) = 1;
end

function Y = sparseApproach(X)
    Y = sparse(1:numel(X), X,1);
end

function Y = sparseFullApproach(X)
    Y = full(sparse(1:numel(X), X,1));
end

Results

If you need a non-sparse output bsxfun performs the best, but if you can use a sparse matrix (without conversion to a full matrix), then that is the fastest and most memory efficient option.

enter image description here

like image 158
Suever Avatar answered Oct 20 '22 10:10

Suever


I think this is fast specially when matrix dimension grows:

Y = sparse(1:numel(X), X,1);

or

Y = full(sparse(1:numel(X), X,1));
like image 34
rahnema1 Avatar answered Oct 20 '22 10:10

rahnema1


You can use the identity matrix and index into it using the input/labels vector, for example if the labels vector X is some random integer vector

X = randi(3,5,1)

ans =

   2
   1
   2
   3
   3

then, the following will hot one encode X

eye(max(X))(X,:)

which can be conveniently defined as a function using

hotone = @(v) eye(max(v))(v,:)

EDIT:

Although the solution above works in Octave, you have you modify it for Matlab as follows

I = eye(max(X));
I(X,:)
like image 6
osipov Avatar answered Oct 20 '22 09:10

osipov