Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MATLAB repeat numbers based on a vector of lengths

Is there a vectorised way to do the following? (shown by an example):

input_lengths = [ 1 1 1 4       3     2   1 ]
result =        [ 1 2 3 4 4 4 4 5 5 5 6 6 7 ]

I have spaced out the input_lengths so it is easy to understand how the result is obtained

The resultant vector is of length: sum(lengths). I currently calculate result using the following loop:

result = ones(1, sum(input_lengths ));
counter = 1;
for i = 1:length(input_lengths)
    start_index = counter;
    end_index = counter + input_lengths (i) - 1;

    result(start_index:end_index) = i;
    counter = end_index + 1;
end

EDIT:

I can also do this using arrayfun (although that is not exactly a vectorised function)

cell_result = arrayfun(@(x) repmat(x, 1, input_lengths(x)), 1:length(input_lengths), 'UniformOutput', false);
cell_result : {[1], [2], [3], [4 4 4 4], [5 5 5], [6 6], [7]}

result = [cell_result{:}];
result : [ 1 2 3 4 4 4 4 5 5 5 6 6 7 ]
like image 926
Samuel O'Malley Avatar asked May 15 '14 06:05

Samuel O'Malley


People also ask

How do you repeat a number in a vector in MATLAB?

u = repelem( v , n ) , where v is a scalar or vector, returns a vector of repeated elements of v . If n is a scalar, then each element of v is repeated n times. The length of u is length(v)*n .

How do you repeat a vector and time in MATLAB?

B = repmat( A , n ) returns an array containing n copies of A in the row and column dimensions. The size of B is size(A)*n when A is a matrix.

How do you repeat something in MATLAB?

repeat( action , n ) repeats the same action n times. You can specify the input arguments in any order. That is, repeat(action,n) and repeat(n,action) both repeat the action n times.


3 Answers

A fully vectorized version:

selector=bsxfun(@le,[1:max(input_lengths)]',input_lengths);
V=repmat([1:size(selector,2)],size(selector,1),1);
result=V(selector);

Downside is, the memory usage is O(numel(input_lengths)*max(input_lengths))

like image 188
Daniel Avatar answered Oct 24 '22 06:10

Daniel


Benchmark of all solutions

Following the previous benchmark, I group all solutions given here in a script and run it a few hours for a benchmark. I've done this because I think it's good to see what is the performance of each proposed solution with the input lenght as parameter - my intention is not here to put down the quality of the previous one, which gives additional information about the effect of JIT. Moreover, and every participant seems to agree with that, quite a good work was done in all answers, so this great post deserves a conclusion post.

I won't post the code of the script here, this is quite long and very uninteresting. The procedure of the benchmark is to run each solution for a set of different lengths of input vectors: 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 20000, 50000, 100000, 200000, 500000, 1000000. For each input length, I've generated a random input vector based on Poisson law with parameter 0.8 (to avoid big values):

input_lengths = round(-log(1-rand(1,ILen(i)))/poisson_alpha)+1;

Finally, I average the computation times over 100 runs per input length.

I've run the script on my laptop computer (core I7) with Matlab R2013b; JIT is activated.

And here are the plotted results (sorry, color lines), in a log-log scale (x-axis: input length; y-axis: computation time in seconds):

Benchmark 100 trials, all solutions

So Luis Mendo is the clear winner, congrats!

For anyone who wants the numerical results and/or wants to replot them, here they are (cut the table into 2 parts and approximated to 3 digits, for a better display):

N                   10          20          50          100         200         500         1e+03       2e+03
-------------------------------------------------------------------------------------------------------------
OP's for-loop       8.02e-05    0.000133    0.00029     0.00036     0.000581    0.00137     0.00248     0.00542 
OP's arrayfun       0.00072     0.00117     0.00255     0.00326     0.00514     0.0124      0.0222      0.047
Daniel              0.000132    0.000132    0.000148    0.000118    0.000126    0.000325    0.000397    0.000651
Divakar             0.00012     0.000114    0.000132    0.000106    0.000115    0.000292    0.000367    0.000641
David's for-loop    9.15e-05    0.000149    0.000322    0.00041     0.000654    0.00157     0.00275     0.00622
David's arrayfun    0.00052     0.000761    0.00152     0.00188     0.0029      0.00689     0.0122      0.0272
Luis Mendo          4.15e-05    4.37e-05    4.66e-05    3.49e-05    3.36e-05    4.37e-05    5.87e-05    0.000108
Bentoy13's cumsum   0.000104    0.000107    0.000111    7.9e-05     7.19e-05    8.69e-05    0.000102    0.000165
Bentoy13's sparse   8.9e-05     8.82e-05    9.23e-05    6.78e-05    6.44e-05    8.61e-05    0.000114    0.0002
Luis Mendo's optim. 3.99e-05    3.96e-05    4.08e-05    4.3e-05     4.61e-05    5.86e-05    7.66e-05    0.000111

N                   5e+03       1e+04       2e+04       5e+04       1e+05       2e+05       5e+05       1e+06
-------------------------------------------------------------------------------------------------------------
OP's for-loop       0.0138      0.0278      0.0588      0.16        0.264       0.525       1.35        2.73
OP's arrayfun       0.118       0.239       0.533       1.46        2.42        4.83        12.2        24.8
Daniel              0.00105     0.0021      0.00461     0.0138      0.0242      0.0504      0.126       0.264
Divakar             0.00127     0.00284     0.00655     0.0203      0.0335      0.0684      0.185       0.396
David's for-loop    0.015       0.0286      0.065       0.175       0.3         0.605       1.56        3.16
David's arrayfun    0.0668      0.129       0.299       0.803       1.33        2.64        6.76        13.6
Luis Mendo          0.000236    0.000446    0.000863    0.00221     0.0049      0.0118      0.0299      0.0637
Bentoy13's cumsum   0.000318    0.000638    0.00107     0.00261     0.00498     0.0114      0.0283      0.0526
Bentoy13's sparse   0.000414    0.000774    0.00148     0.00451     0.00814     0.0191      0.0441      0.0877
Luis Mendo's optim. 0.000224    0.000413    0.000754    0.00207     0.00353     0.00832     0.0216      0.0441

Ok, I've added another solution to the list ... I could not prevent myself to optimize the best-so-far solution of Luis Mendo. No credit for that, it's just a variant from Luis Mendo's, I'll explain it later.

Clearly, the solutions using arrayfun are very time-consuming. The solutions using an explicit for loop are faster, yet still slow compared with others solutions. So yes, vectorizing is still a major option for optimizing a Matlab script.

Since I've seen a big dispersion on the computing times of the fastest solutions, especially with input lengths between 100 and 10000, I decide to benchmark more precisely. So I've put the slowest apart (sorry), and redo the benchmark over the 6 other solutions which run much faster. The second benchmark over this reduced list of solutions is identical except that I've average over 1000 runs.

Benchmark 1000 trials, best solutions

(No table here, unless you really want to, it's quite the same numbers as before)

As it was remarked, the solution by Daniel is a little faster than the one by Divakar because it seems that the use of bsxfun with @times is slower than using repmat. Still, they are 10 times faster than for-loop solutions: clearly, vectorizing in Matlab is a good thing.

The solutions of Bentoy13 and Luis Mendo are very close; the first one uses more instructions, but the second one uses an extra allocation when concatenating 1 to cumsum(input_lengths(1:end-1)). And that's why we see that Bentoy13's solution tends to be a bit faster with big input lengths (above 5.10^5), because there is no extra allocation. From this consideration, I've made an optimized solution where there is no extra allocation; here is the code (Luis Mendo can put this one in his answer if he wants to :) ):

result = zeros(1,sum(input_lengths));
result(1) = 1;
result(1+cumsum(input_lengths(1:end-1))) = 1;
result = cumsum(result);

Any comment for improvement is welcome.

like image 31
5 revs, 2 users 98% Avatar answered Oct 24 '22 05:10

5 revs, 2 users 98%


More of a comment than anything, but I did some tests. I tried a for loop, and an arrayfun, and I tested your for loop and arrayfun version. Your for loop was the fastest. I think this is because it is simple, and allows the JIT compilation to do the most optimisation. I am using Matlab, octave might be different.

And the timing:

Solution:     With JIT   Without JIT  
Sam for       0.74       1.22    
Sam arrayfun  2.85       2.85    
My for        0.62       2.57    
My arrayfun   1.27       3.81    
Divakar       0.26       0.28    
Bentoy        0.07       0.06    
Daniel        0.15       0.16
Luis Mendo    0.07       0.06

So Bentoy's code is really fast, and Luis Mendo's is almost exactly the same speed. And I rely on JIT way too much!


And the code for my attempts

clc,clear
input_lengths = randi(20,[1 10000]);

% My for loop
tic()
C=cumsum(input_lengths);
D=diff(C);
results=zeros(1,C(end));
results(1,1:C(1))=1;
for i=2:length(input_lengths)
    results(1,C(i-1)+1:C(i))=i*ones(1,D(i-1));
end
toc()

tic()
A=arrayfun(@(i) i*ones(1,input_lengths(i)),1:length(input_lengths),'UniformOutput',false);
R=[A{:}];
toc()
like image 10
David Avatar answered Oct 24 '22 07:10

David