Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is reshape so fast? (Spoiler: Copy-on-Write)

I have a big matrix A which is 1GB of double values, when I reshape it to different dimensions, it's incredible fast.

A=rand(128,1024,1024);
tic;B=reshape(A,1024,128,1024);toc

Elapsed time is 0.000011 seconds.

How can it be that fast? Another observation, MATLAB uses less memory than it should after running that code and storing two matrices of 1GB each: Memory used by MATLAB: 1878 MB (1.969e+09 bytes)

like image 532
Daniel Avatar asked Mar 17 '16 13:03

Daniel


1 Answers

Explanation of the good performance

Matlab uses copy-on-write whenever possible. If you write expressions like B=A, MATLAB does not copy A, instead both variables A and B are references to the same data structure. Only if one of the two variables will be modified, MATLAB will create a copy.

Now to the special case of reshape. Here it looks like A and B are not the same, but in memory they are. The underlying array which holds the data is unaffected by the reshape operation, nothing has to be moved: all(A(:)==B(:)). Everything MATLAB has to do when calling reshape is to create a new reference and annotate it with the new dimensions of the matrix. Reshaping a matrix is nothing more than creating a new reference to the input data, which is annotated with the new dimensions. The runtime of reshape is less than 1µs or roughly the time two simple assignments like B=A require. For all practical applications a zero time operation.

>> tic;for i=1:1000;B=reshape(A,1024,128,1024);end;toc
Elapsed time is 0.000724 seconds.
>> tic;for i=1:1000;B=A;end;toc
Elapsed time is 0.000307 seconds.

It is unknown how large such a reference really is, but we can assume it to be within a few bytes.

Other zero cost operations

Functions known to have practically zero cost (both runtime and memory):

  • B=reshape(A,sz)
  • B=A(:)
  • B=A.' - only for Vectors
  • B=A' - only for Vectors of real numbers, without the attribute complex. Use .' instead.
  • B=permute(A,p) - only for the cases where all(A(:)==B(:)).1
  • B=ipermute(A,p) - only for the cases where all(A(:)==B(:)).1
  • B=squeeze(A) 1
  • shiftdim - only for the cases where all(A(:)==B(:)), which are:1
    • used to remove leading singleton dimensions.
    • used with negative second input
    • used without second input argument.

Functions which are "expensive", regardless of the fact that they don't touch the representation in memory (all(A(:)==B(:)) is true)

  • Left sided indexing: B(1:numel(A))=A; 2
  • Right sided indexing other than (:), including B=A(1:end); and B=A(:,:,:); 2

1 Significantly slower runtime than reshape between 1µs and 1ms. Probably because of some constant computation overhead. Memory consumption is practically zero and the runtime is independent from the input size. Operations without this annotation have a runtime below 1µs and roughly equivalent to reshape.

2 Zero cost in OCTAVE

Originally used MATLAB 2013b when writing this post. Confirmed the numbers with MATLAB 2019b.

like image 98
Daniel Avatar answered Nov 02 '22 19:11

Daniel