Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I speed up this call to quantile in Matlab?

I have a MATLAB routine with one rather obvious bottleneck. I've profiled the function, with the result that 2/3 of the computing time is used in the function levels:

enter image description here

The function levels takes a matrix of floats and splits each column into nLevels buckets, returning a matrix of the same size as the input, with each entry replaced by the number of the bucket it falls into.

To do this I use the quantile function to get the bucket limits, and a loop to assign the entries to buckets. Here's my implementation:

function [Y q] = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"

p = linspace(0, 1.0, nLevels+1);

q = quantile(X,p);
if isvector(q)
    q=transpose(q);
end

Y = zeros(size(X));

for i = 1:nLevels
    % "The variables g and l indicate the entries that are respectively greater than
    % or less than the relevant bucket limits. The line Y(g & l) = i is assigning the
    % value i to any element that falls in this bucket."
    if i ~= nLevels % "The default; doesnt include upper bound"
        g = bsxfun(@ge,X,q(i,:));
        l = bsxfun(@lt,X,q(i+1,:));
    else            % "For the final level we include the upper bound"
        g = bsxfun(@ge,X,q(i,:));
        l = bsxfun(@le,X,q(i+1,:));
    end
    Y(g & l) = i;
end

Is there anything I can do to speed this up? Can the code be vectorized?

like image 794
Chris Taylor Avatar asked Dec 22 '11 09:12

Chris Taylor


People also ask

How to compute quantiles in matlab?

Q = quantile(A,n,1) computes quantiles of the columns in A for the n evenly spaced cumulative probabilities. Because 1 is the specified operating dimension, Q has n rows. Q = quantile(A,n,2) computes quantiles of the rows in A for the n evenly spaced cumulative probabilities.

How do I get quantile in R?

Create Quantiles of a Data Set in R Programming – quantile() Function. quantile() function in R Language is used to create sample quantiles within a data set with probability[0, 1]. Such as first quantile is at 0.25[25%], second is at 0.50[50%], and third is at 0.75[75%].


2 Answers

If I understand correctly, you want to know how many items fell in each bucket. Use:

n = hist(Y,nbins)

Though I am not sure that it will help in the speedup. It is just cleaner this way.

Edit : Following the comment:

You can use the second output parameter of histc

[n,bin] = histc(...) also returns an index matrix bin. If x is a vector, n(k) = >sum(bin==k). bin is zero for out of range values. If x is an M-by-N matrix, then

like image 188
Andrey Rubshtein Avatar answered Oct 07 '22 22:10

Andrey Rubshtein


How About this

function [Y q] = levels(X,nLevels)

p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p); 
Y = zeros(size(X));
for i = 1:numel(q)-1    
    Y = Y+ X>=q(i);
end

This results in the following:

>>X = [3 1 4 6 7 2];
>>[Y, q] = levels(X,2)

Y =

     1  1  2  2  2  1

q =

     1  3.5  7

You could also modify the logic line to ensure values are less than the start of the next bin. However, I don't think it is necessary.

like image 25
Aero Engy Avatar answered Oct 07 '22 23:10

Aero Engy