I have a MATLAB routine with one rather obvious bottleneck. I've profiled the function, with the result that 2/3 of the computing time is used in the function <code>levels</code>: <img src="https://i.stack.imgur.com/FmITm.png" alt="enter image description here"> The function <code>levels</code> takes a matrix of floats and splits each column into <code>nLevels</code> buckets, returning a matrix of the same size as the input, with each entry replaced by the number of the bucket it falls into. To do this I use the <code>quantile</code> function to get the bucket limits, and a loop to assign the entries to buckets. Here's my implementation: <pre class="prettyprint"><code>function [Y q] = levels(X,nLevels) % "Assign each of the elements of X to an integer-valued level" p = linspace(0, 1.0, nLevels+1); q = quantile(X,p); if isvector(q) q=transpose(q); end Y = zeros(size(X)); for i = 1:nLevels % "The variables g and l indicate the entries that are respectively greater than % or less than the relevant bucket limits. The line Y(g & l) = i is assigning the % value i to any element that falls in this bucket." if i ~= nLevels % "The default; doesnt include upper bound" g = bsxfun(@ge,X,q(i,:)); l = bsxfun(@lt,X,q(i+1,:)); else % "For the final level we include the upper bound" g = bsxfun(@ge,X,q(i,:)); l = bsxfun(@le,X,q(i+1,:)); end Y(g & l) = i; end </code></pre> Is there anything I can do to speed this up? Can the code be vectorized?

If I understand correctly, you want to know how many items fell in each bucket. Use: <blockquote> n = hist(Y,nbins) </blockquote> Though I am not sure that it will help in the speedup. It is just cleaner this way. Edit : Following the comment: You can use the second output parameter of histc <blockquote> [n,bin] = histc(...) also returns an index matrix bin. If x is a vector, n(k) = >sum(bin==k). bin is zero for out of range values. If x is an M-by-N matrix, then </blockquote>

How About this <pre class="prettyprint"><code>function [Y q] = levels(X,nLevels) p = linspace(0, 1.0, nLevels+1); q = quantile(X,p); Y = zeros(size(X)); for i = 1:numel(q)-1 Y = Y+ X>=q(i); end </code></pre> This results in the following: <pre class="prettyprint"><code>>>X = [3 1 4 6 7 2]; >>[Y, q] = levels(X,2) Y = 1 1 2 2 2 1 q = 1 3.5 7 </code></pre> You could also modify the logic line to ensure values are less than the start of the next bin. However, I don't think it is necessary.

How can I speed up this call to quantile in Matlab?

Tags:

optimization

vectorization

matlab

bsxfun

I have a MATLAB routine with one rather obvious bottleneck. I've profiled the function, with the result that 2/3 of the computing time is used in the function levels:

enter image description here

The function levels takes a matrix of floats and splits each column into nLevels buckets, returning a matrix of the same size as the input, with each entry replaced by the number of the bucket it falls into.

To do this I use the quantile function to get the bucket limits, and a loop to assign the entries to buckets. Here's my implementation:

function [Y q] = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"

p = linspace(0, 1.0, nLevels+1);

q = quantile(X,p);
if isvector(q)
    q=transpose(q);
end

Y = zeros(size(X));

for i = 1:nLevels
    % "The variables g and l indicate the entries that are respectively greater than
    % or less than the relevant bucket limits. The line Y(g & l) = i is assigning the
    % value i to any element that falls in this bucket."
    if i ~= nLevels % "The default; doesnt include upper bound"
        g = bsxfun(@ge,X,q(i,:));
        l = bsxfun(@lt,X,q(i+1,:));
    else            % "For the final level we include the upper bound"
        g = bsxfun(@ge,X,q(i,:));
        l = bsxfun(@le,X,q(i+1,:));
    end
    Y(g & l) = i;
end

Is there anything I can do to speed this up? Can the code be vectorized?

794

asked Dec 22 '11 09:12

Chris Taylor

2 Answers

If I understand correctly, you want to know how many items fell in each bucket. Use:

n = hist(Y,nbins)

Though I am not sure that it will help in the speedup. It is just cleaner this way.

Edit : Following the comment:

You can use the second output parameter of histc

[n,bin] = histc(...) also returns an index matrix bin. If x is a vector, n(k) = >sum(bin==k). bin is zero for out of range values. If x is an M-by-N matrix, then

188

answered Oct 07 '22 22:10

Andrey Rubshtein

How About this

function [Y q] = levels(X,nLevels)

p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p); 
Y = zeros(size(X));
for i = 1:numel(q)-1    
    Y = Y+ X>=q(i);
end

This results in the following:

>>X = [3 1 4 6 7 2];
>>[Y, q] = levels(X,2)

Y =

     1  1  2  2  2  1

q =

     1  3.5  7

You could also modify the logic line to ensure values are less than the start of the next bin. However, I don't think it is necessary.

answered Oct 07 '22 23:10

Aero Engy

Related questions
                            
                                how to find the similarity between two curves and the score of similarity?
                            
                                Scaleable, draggable box on plots that can select data
                            
                                Constructing a multi-order Markov chain transition matrix in Matlab
                            
                                Octave: make it as much MATLAB-compatible as possible
                            
                                Matlab: Does calling the same mex function repeatedly from a loop incur too much overhead?
                            
                                Display images in different sizes in MATLAB
                            
                                Scipy interp1d and matlab interp1
                            
                                What is the fastest way to quadratic form numpy array multiplication?
                            
                                Why LOG filter is returning the black background image?
                            
                                Natural Logarithm of Bessel Function, Overflow
                            
                                Command window for java?
                            
                                Unable to read MAT file with scipy
                            
                                Performance: Matlab vs C++ Matrix vector multiplication
                            
                                MATLAB App - Add path before component creation
                            
                                save plot into image file in matlab: difference between saveas and print
                            
                                Is there anything like deal() for normal MATLAB arrays? [duplicate]
                            
                                What interpolation is best for image rotation?
                            
                                MATLAB parfor slicing issue?
                            
                                Non-blocking call to external program without losing return code
                            
                                How to provide Matlab with the old gcc version it wants?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With