Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Code optimisation: Arrays vs collections

In terms of memory consumption/ execution time, what's a more expensive way of adding items to a group

Redim Preserve myArray(1 To Ubound(myArray) + 1)
myArray(Ubound(myArray)) = myVal

Or

myCollection.Add myVal

Does the fastest one depend on myVal, or vary with the size of the groups? Is there a faster method still?

I have the array/collection declared Privately in the declarations portion of a class if that makes a difference, but I'd like to know what is happening behind the scenes and which approach is generally faster (not in terms of readability or maintainability, just execution time)

Tests

OK, having run some tests adding many instances of 1 to groups and collections, my results are:

  • collection method 10 x faster than a variant array
  • collection method 5 x faster than a long array
  • collection method 1.5 x faster than a byte array

Results were all for approx 5 seconds of looping with this code:

Sub testtime()
Dim sttime As Double
Dim endtime As Double
Dim i As Long
Dim u As Long
i = 0
ReDim a(1 To 1) 'or Set c = New Collection
sttime = Timer
endtime = Timer + 5
Do Until Timer > endtime
    u = UBound(a) + 1
    ReDim Preserve a(1 To u)
    a(u) = 1
    'or c.Add 1
    i = i + 1
Loop
endtime = Timer
Debug.Print (endtime - sttime) / i; "s", i; "iterations", Round(endtime - sttime, 3); "(ish) s"
End Sub

So it looks like for adding that item, with relatively large groups; adding to a collection is faster, but I'd like to know why?

like image 614
Greedo Avatar asked Aug 17 '17 16:08

Greedo


1 Answers

ReDim Preserve is skewing it all.

ReDim Preserve myArray(1 To UBound(myArray) + 1)

That's inherently inefficient, and an unfair comparison. You're copying the entire array internally, every time you add an item. I would hope a Collection is much more efficient than that. If not, use .NET's ArrayList, which is deprecated in .NET since v2.0 introduced generics and List<T>, but usable - and useful - in VBA (.NET generics can't be used in VBA).

An ArrayList doesn't resize its internal _items array every single time an item is added! Notice the comments:

// Adds the given object to the end of this list. The size of the list is
// increased by one. If required, the capacity of the list is doubled
// before adding the new element.
//
public virtual int Add(Object value) {
    Contract.Ensures(Contract.Result<int>() >= 0);
    if (_size == _items.Length) EnsureCapacity(_size + 1);
    _items[_size] = value;
    _version++;
    return _size++;
}

...

// Ensures that the capacity of this list is at least the given minimum
// value. If the currect capacity of the list is less than min, the
// capacity is increased to twice the current capacity or to min,
// whichever is larger.
private void EnsureCapacity(int min) {
    if (_items.Length < min) {
        int newCapacity = _items.Length == 0? _defaultCapacity: _items.Length * 2;
        // Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow.
        // Note that this check works even when _items.Length overflowed thanks to the (uint) cast
        if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength;
        if (newCapacity < min) newCapacity = min;
        Capacity = newCapacity;
    }
}

https://referencesource.microsoft.com/#mscorlib/system/collections/arraylist.cs

I don't know about the internals of a VBA.Collection, but if I had to guess, I would say it likely has a similar mechanism that avoids re-dimensioning the internal array every single time an item is added. But that's all moot, since nobody except Microsoft knows how VBA.Collection is implemented.

What we can do though, is run benchmarks and compare - let's add one million values to an array, a collection, and heck, an ArrayList:

Public Sub TestReDimArray()
    Dim sut() As Variant
    ReDim sut(0 To 0)
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While UBound(sut) < 1000000
        ReDim Preserve sut(0 To i)
        sut(i) = i
        i = i + 1
    Loop
    Debug.Print "ReDimArray added 1M items in " & Timer - t & " seconds."
End Sub

Public Sub TestCollection()
    Dim sut As VBA.Collection
    Set sut = New VBA.Collection
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While sut.Count < 1000000
        sut.Add i
        i = i + 1
    Loop
    Debug.Print "Collection added 1M items in " & Timer - t & " seconds."
End Sub

Public Sub TestArrayList()
    Dim sut As Object
    Set sut = CreateObject("System.Collections.ArrayList")
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While sut.Count < 1000000
        sut.Add i
        i = i + 1
    Loop
    Debug.Print "ArrayList added 1M items in " & Timer - t & " seconds."
End Sub

Here's the output:

ReDimArray added 1M items in 14.90234 seconds.
Collection added 1M items in 0.1875 seconds.
ArrayList added 1M items in 15.64453 seconds.

Note that referencing the 32-bit mscorlib.tlb and early-binding the ArrayList doesn't make much of a difference. Plus there's managed/COM interop overhead, and VBA doesn't support constructors, so the ArrayList initializes with a capacity of 4 that doubles every time capacity is reached, i.e. when we insert the millionth item we've resized the internal array 19 times and end up with an internal capacity of 1,048,576 items.

So how is Collection winning by that much anyway?

Because the array is being abused: resizing isn't what arrays do best, and resizing before every insert can't possibly go well.

When to use arrays?

Use arrays when you know the number of elements in advance:

Public Sub TestPopulateArray()
    Dim sut(0 To 999999) As Variant
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While i < 1000000
        sut(i) = i
        i = i + 1
    Loop
    Debug.Print "PopulateArray added 1M items in " & Timer - t & " seconds."
End Sub

Output:

PopulateArray added 1M items in 0.0234375 seconds.

That's roughly 10 times faster than adding the same number of items to a VBA.Collection - well-used arrays are blazingly fast.


TL;DR

Keep array resizing to a minimum - avoid it as much as possible. If you don't know the number of items you're going to end up with, use a Collection. If you do, use an explicitly sized Array.

like image 93
Mathieu Guindon Avatar answered Sep 27 '22 00:09

Mathieu Guindon