Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace all zeros in vector by previous non-zero value

Matlab/Octave algorithm example:

 input vector: [ 1 0 2 0 7 7 7 0 5 0 0 0 9 ] output vector: [ 1 1 2 2 7 7 7 7 5 5 5 5 9 ] 

The algorithm is very simple: it goes through the vector and replaces all zeros with the last non-zero value. It seems trivial, and is so when done with a slow for (i=1:length) loop and being able to refer to the previous element (i-1), but looks impossible to be formulated in the fast vectorized form. I tried the merge() and shift() but it only works for the first occurrence of zero, not an arbitrary number of them.

Can it be done in a vectorized form in Octave/Matlab or must C be used for this to have sufficient performance on big amount of data?


I have another similar slow for-loop algorithm to speed up and it seems generally impossible to refer to previous values in a vectorized form, like an SQL lag() or group by or loop (i-1) would easily do. But Octave/Matlab loops are terribly slow.

Has anyone found a solution to this general problem or is this futile for fundamental Octave/Matlab design reasons?


Performance benchmark:

SOLUTION 1 (slow loop)

in = repmat([ 1 0 2 0 7 7 7 0 5 0 0 0 9 ] ,1 ,100000); out = in; tic for i=2:length(out)     if (out(i)==0)        out(i)=out(i-1);    end end toc [in(1:20); out(1:20)] % test to show side by side if ok 

Elapsed time is 15.047 seconds.

SOLUTION 2 by Dan (~80 times faster)

in = V = repmat([ 1 0 2 0 7 7 7 0 5 0 0 0 9 ] ,1 ,100000); tic; d = double(diff([0,V])>0); d(find(d(2:end))+1) = find(diff([0,~V])==-1) - find(diff([0,~V])==1); out = V(cumsum(~~V+d)-1); toc; [in(1:20); out(1:20)] % shows it works ok 

Elapsed time is 0.188167 seconds.

15.047 / 0.188167 = 79.97 times improvement

SOLUTION 3 by GameOfThrows (~115 times faster)

in = repmat([ 1 0 2 0 7 7 7 0 5 0 0 0 9 ] ,1 ,100000); a = in; tic; pada = [a,888]; b = pada(pada >0); bb = b(:,1:end-1); c = find (pada==0); d = find(pada>0); len = d(2:end) - (d(1:end-1)); t = accumarray(cumsum([1,len])',1); out = bb(cumsum(t(1:end-1))); toc; 

Elapsed time is 0.130558 seconds.

15.047 / 0.130558 = 115.25 times improvement

Magical SOLUTION 4 by Luis Mendo (~250 times faster)

in = repmat([ 1 0 2 0 7 7 7 0 5 0 0 0 9 ] , 1, 100000); tic; u = nonzeros(in); out = u(cumsum(in~=0)).'; toc; 

Elapsed time is 0.0597501 seconds.

15.047 / 0.0597501 = 251.83 times improvement


(Update 2019/03/13) Timings with MATLAB R2017a:

Slow loop:    0.010862 seconds. Dan:          0.072561 seconds. GameOfThrows: 0.066282 seconds. Luis Mendo:   0.032257 seconds. fillmissing:  0.053366 seconds. 

So we draw yet again the same conclusion: loops in MATLAB are no longer slow!


See also: Trivial/impossible algorithm challenge in Octave/Matlab Part II: iterations memory

like image 708
Paweł Załuski Avatar asked Dec 02 '15 11:12

Paweł Załuski


1 Answers

The following simple approach does what you want, and is probably very fast:

in = [1 0 2 0 7 7 7 0 5 0 0 0 9]; t = cumsum(in~=0); u = nonzeros(in); out = u(t).'; 
like image 76
Luis Mendo Avatar answered Sep 24 '22 07:09

Luis Mendo