In Matlab, this type of algorithm ("growing arrays") is advised against
mine = []
for i=1:100,
mine = [mine,randn(1)]
end
whereas it seems that many examples for Python show this kind of algorithm (this is a really bad example though):
import numpy.random as rand
mine = []
for i in range(100):
mine.append(rand.random(1)[0])
I wonder why that is -- what is the difference?
The difference is that:
That said, I think the difference is largely cultural:
ndarray
would be used instead, and ndarray
would offer exactly the same tradeoffs as a MATLAB matrix.Appending to arrays in Matlab is apparently very inefficient (it runs in quadratic time), whereas in python the corresponding list operation is much more highly optimized. Appends are O(1) up until the list becomes full - at which point the size of the list is doubled to make more room (which is an O(n) operation). This means appends become increasingly efficient as the list grows longer (the overall efficiency is O(1) amortized). These kinds of optimizations could probably also be achieved in Matlab, but it seems they are not done automatically.
For even better performance, python also has the collections.deque container class which supports efficient appending and removal from either end (it's about O(1) in both directions).
Your two examples are not exactly equivalent. The Matlab example is concatenating two lists into a new list, making a copy each time, whereas the Python is appending items to a list without making a copy of it every time. Now you can in fact write an exact Python equivalent of the Matlab code, e.g.:
mine = mine + [newitem]
But you shouldn't do that, because you're making a copy of an ever-growing list each time. Which is why lists have an .append()
method (also .extend()
).
For similar reasons, rather than building up a string by concatenation, Pythonistas recommend you append the individual strings to a list then use ``.join()
on it.
By the way, Python lists are always allocated with space for extra items, so that they do not always need to be grown when a new item is appended.
Your MATLAB code can be better written. Compare the following implementations:
%# preallocation
tic
x = zeros(1,100000); for i=1:100000, x(i) = 99; end
toc
%# appending
tic
x = []; for i=1:100000, x(end+1) = 99; end
toc
%# concatenation
tic
x = []; for i=1:100000, x = [x, 99]; end
toc
I get the following results:
Elapsed time is 0.001844 seconds. %# preallocated vector/matrix
Elapsed time is 0.109597 seconds. %# appending using "end+1" syntax
Elapsed time is 35.226361 seconds. %# appending using concatenation
Note that the above was tested on R2011b, which introduced improvements on growing matrices (without pre-allocation).
You should also check this previous answer for a solution that combines preallocation while still allowing for dynamic growing (idea is to allocate/grow in large block sizes)
On the other side, you should note that Python lists are optimized for appending items at the end. If you insert items at the beginning, you will get very different timings. Example:
>>> from timeit import timeit
>>> timeit('x.insert(0,99)', 'x=[]', number=100000)
5.3840245059078597
>>> timeit('x.append(99)', 'x=[]', number=100000) # x.insert(len(x),99)
0.039047700196533697
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With