Why does the classic implementation of Vector (ArrayList for Java people) double its internal array size on each expansion instead of tripling or quadrupling it?
By default, the vector increases its capacity by double. However, if an increment is specified in its constructor, Vector will grow in accordance with it in each allocation cycle.
If you grow the array one element at a time, you end up with quadratic behaviour as you copy the elements from array to array+1 to array+2. Doubling reduces the cost to linear time. This growth strategy gives you get so-called "amortized constant time insertions".
When calculating the average time to insert into a vector, you need to allow for the non-growing inserts and the growing inserts.
Call the total number of operations to insert n items ototal, and the average oaverage.
If you insert n items, and you grow by a factor of A as required, then there are ototal = n + ΣAi [ 0 < i < 1 + lnAn ] operations. In the worst case you use 1/A of the allocated storage.
Intuitively, A = 2 means at worst you have ototal = 2n, so oaverage is O(1), and the worst case you use 50% of the allocated storage.
For a larger A, you have a lower ototal, but more wasted storage.
For a smaller A, ototal is larger, but you don't waste so much storage. As long as it grows geometrically, it's still O(1) amortised insertion time, but the constant will get higher.
For growth factors 1.25 ( red ), 1.5 ( cyan ), 2 ( black ), 3 ( blue ) and 4 ( green ), these graphs show point and average size efficiency ( ratio of size/allocated space; more is better ) on the left and time efficiency ( ratio of insertions / operations; more is better ) on the right for inserting 400,000 items. 100% space efficiency is reached for all growth factors just prior to resizing; the case for A = 2 shows time efficiency between 25% and 50%, and space efficiency about 50%, which is good for most cases:
For runtimes such as Java, arrays are zero filled, so the number of operations to allocate is proportional to the size of the array. Taking account of this gives reduces the difference between the time efficiency estimates:
Any multiple is a compromise. Make it too big and you waste too much memory. Make it too small and you waste much time for reallocations and copying. I guess that doubling is there because it works and is very easy to implement. I also saw a proprietary STL-like library that uses 1.5 as multiplier for the same - I guess its developers considered doubling wasting too much memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With