Appending rows/columns to a matrix in MATLAB seems quite something to consider. For example, when I tried appending a column to a matrix A
which has many rows and many columns, like
A = [A, added_col]
Matlab will warn me that since this has to make a copy of A in the memory, I'd better use pre-allocation for speed. This is understandable, because the underlying data in A
occupies a contiguous block of memory.
My question is, will removing rows/columns cause similar issues? For example, to remove the second row of A
:
A(2,:) = []
Is this operation in-place or not? I really feel unsure, since for one thing it doesn't seem to make any new room for data in the memory, and for another, the rows of A
will be stored non-contiguously (since row 2 is removed).
So what will happen internally? And is this operation efficient enough to use in practice? Thanks!
Just tested it with a complexity of 100000
:
clc; clear;
N = 100000;
A = zeros(N, 3);
t1 = tic;
for ii = 1:N
A(ii, :) = [1 2 3];
end
t2 = toc;
And
clc; clear;
N = 100000;
A = zeros(N, 3);
t1 = tic;
for ii = (N-1):-1:2
A(ii, :) = [];
end
t2 = toc;
Results: 0.009s for the first (modifying preallocated matrix) and 53.429s for the second (removing rows from a matrix). I think this basically settles this question: NO, removing rows/columns from a matrix is NOT efficient to use, since it definitely involves deepcopying data and reallocating memory.
Also, removing columns instead of rows isn't a good idea either. As I tested, it still takes as good as approx two minutes, on the above complexity scale:
N = 100000;
test_m = zeros(3, N);
tic
for ii = (N - 1):-1:2
test_m(:, ii) = [];
end
toc
% result: 105.436595 seconds.
% This was run on a different machine than the previous examples.
% But is still enough evidence that dynamically resizing a big matrix is a BAD idea.
So, end of the story: Don't try to remove columns or rows in this way, unless you have a really small matrix. For bulky matrices, always use preallocation instead.
There are a few issues here:
1) Without using mex, you have no control over whether matrix removal uses the same memory or not. However, you can tell if it happens or not. One approach would be to write something in mex. Alternatively you can activate format debug
N = 100000;
test_m = zeros(3, N);
t = evalc('disp(test_m)');
disp(t(1:100))
test_m(:,2:N-1) = [];
disp(t(1:100))
This yields the output
Structure address = 1259f6da0
m = 3
n = 100000
pr = 15d6d2020
pi = 0
Columns 1 through 9
0
Structure address = 1259f6b70
m = 3
n = 2
pr = 608001c95320
pi = 0
0 0
0 0
0
0
Note that to keep the display reasonable I capture the output of displaying the variable and then only show part of it. This output, particularly pr
(pointer to real data) indicates that reallocation occurred. I wasn't able to find any situation where reallocation didn't occur.
2) As mentioned in some of the comments and alluded to in the question, memory is stored as column-major. Thus when you delete a column it might be more efficient than deleting a row ...
3) I'm not entirely sure if the code examples are realistic but it makes a lot more sense to remove all columns or rows in one go. This occurs really quickly.
N = 100000;
test_m = zeros(3, N);
test_m(:,2:N-1) = [];
4) Finally, I'm not sure what your wording refers to with preallocating for modifications vs removal. Big picture it is probably best to avoid removing rows or columns in a loop. Instead hold onto an array that indicates which columns or rows should be removed and then do it in one pass.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With