Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Variable iterating on itself - different behavior with different types

Please take a look at the latest updates at the end of the post.

In Particular, see Update 4: the Variant comparison Curse


I’ve already seen mates banging their head against the wall to understand how a variant works, but never imagined that I will have my own bad moment with it.

I have successfully used the following VBA construction:

For i = 1 to i

This works perfectly when i is an Integer or any numeric type, iterating from 1 to the original value of i. I do this on occasions where i is a ByVal parameter - you might say lazy - to spare myself the declaration of a new variable.

Then I had a bug when this construct “stopped” working as expected. After some hard debugging, I found that it doesn’t work the same way when i is not declared as explicit numeric type, but a Variant. The question is twofold:

1- What are the exact semantics of the For and the For Each loops? I mean what is the sequence of actions that the compiler undertakes and in which order? For example, does the evaluation of the limit precede the initialization of the counter? Is this limit copied and “fixed” somewhere before the loop starts? Etc. The same question applies to For Each.

2- How to explain the different outcomes on variants and on explicit numeric types? Some say a variant is an (immutable) reference type, can this definition explain the observed behavior?

I have prepared an MCVE for different (independent) scenarios involving the For and the For Each statements, combined with integers, variants and objects. The surprising results urge for defining unambiguously the semantics or, for the least, check if those results do conform to the defined semantics.

All insights are welcome, including partial ones that explain some of the surprising results or their contradictions.

Thanks.

Sub testForLoops()
    Dim i As Integer, v As Variant, vv As Variant, obj As Object, rng As Range

    Debug.Print vbCrLf & "Case1 i --> i    ",
    i = 4
    For i = 1 To i
        Debug.Print i,      ' 1, 2, 3, 4
    Next

    Debug.Print vbCrLf & "Case2 i --> v    ",
    v = 4
    For i = 1 To v  ' (same if you use a variant counter: For vv = 1 to v)
        v = i - 1   ' <-- doesn't affect the loop's outcome
        Debug.Print i,          ' 1, 2, 3, 4
    Next

    Debug.Print vbCrLf & "Case3 v-3 <-- v ",
    v = 4
    For v = v To v - 3 Step -1
       Debug.Print v,           ' 4, 3, 2, 1
    Next

    Debug.Print vbCrLf & "Case4 v --> v-0 ",
    v = 4
    For v = 1 To v - 0
        Debug.Print v,          ' 1, 2, 3, 4
    Next

    '  So far so good? now the serious business

    Debug.Print vbCrLf & "Case5 v --> v    ",
    v = 4
    For v = 1 To v
        Debug.Print v,          ' 1      (yes, just 1)
    Next

    Debug.Print vbCrLf & "Testing For-Each"

    Debug.Print vbCrLf & "Case6 v in v[]",
    v = Array(1, 1, 1, 1)
    i = 1
    ' Any of the Commented lines below generates the same RT error:
    'For Each v In v  ' "This array is fixed or temporarily locked"
    For Each vv In v
        'v = 4
        'ReDim Preserve v(LBound(v) To UBound(v))
        If i < UBound(v) Then v(i + 1) = i + 1 ' so we can alter the entries in the array, but not the array itself
        i = i + 1
         Debug.Print vv,            ' 1, 2, 3, 4
    Next

    Debug.Print vbCrLf & "Case7 obj in col",
    Set obj = New Collection: For i = 1 To 4: obj.Add Cells(i, i): Next
    For Each obj In obj
        Debug.Print obj.Column,    ' 1 only ?
    Next

    Debug.Print vbCrLf & "Case8 var in col",
    Set v = New Collection: For i = 1 To 4: v.Add Cells(i, i): Next
    For Each v In v
        Debug.Print v.column,      ' nothing!
    Next

    ' Excel Range
    Debug.Print vbCrLf & "Case9 range as var",
    ' Same with collection? let's see
    Set v = Sheet1.Range("A1:D1") ' .Cells ok but not .Value => RT err array locked
    For Each v In v ' (implicit .Cells?)
        Debug.Print v.Column,       ' 1, 2, 3, 4
    Next

    ' Amazing for Excel, no need to declare two vars to iterate over a range
    Debug.Print vbCrLf & "Case10 range in range",
    Set rng = Range("A1:D1") '.Cells.Cells add as many as you want
    For Each rng In rng ' (another implicit .Cells here?)
        Debug.Print rng.Column,     ' 1, 2, 3, 4
    Next
End Sub

UPDATE 1

An interesting observation that can help understanding some of this. Concerning cases 7 and 8: if we hold another reference on the collection being iterated, the behavior changes completely:

    Debug.Print vbCrLf & "Case7 modified",
    Set obj = New Collection: For i = 1 To 4: obj.Add Cells(i, i): Next
    Dim obj2: set obj2 = obj  ' <-- This changes the whole thing !!!
    For Each obj In obj
        Debug.Print obj.Column,    ' 1, 2, 3, 4 Now !!!
    Next

This means that in the initial case7 the collection being iterated was garbage-collected (due to reference counting) just after the variable obj was assigned to the first element of the collection. But this is still weird though. The compiler should have held some hidden reference on the object being iterated!? Compare this to case 6 where the array being iterated was "locked"...

UPDATE 2

The semantics of the For statement as defined by MSDN can be found on this page. You can see that it is explicitly stated that the end-value should be evaluated only once and before the execution of the loop proceeds. Should we consider this odd behavior as a compiler bug?

UPDATE 3

The intriguing case 7 again. The counter-intuitive behavior of case7 is not restricted to the (say unusual) iteration of a variable on itself. It may happen in a seemingly "innocent" code that, by mistake removes the only reference on the collection being iterated, leading to its garbage collection.

Debug.Print vbCrLf & "Case7 Innocent"
Dim col As New Collection, member As Object, i As Long
For i = 1 To 4: col.Add Cells(i, i): Next
Dim someCondition As Boolean ' say some business rule that says change the col
For Each member In col
    someCondition = True
    If someCondition Then Set col = Nothing ' or New Collection
    ' now GC has killed the initial collection while being iterated
    ' If you had maintained another reference on it somewhere, the behavior would've been "normal"
    Debug.Print member.Column, ' 1 only
Next

By intuition one expects that some hidden reference is held on the collection to stay alive during iteration. Not only it doesn't, but the program runs smoothly with no run-time error, leading probably to hard bugs. While the spec does not state any rule about manipulating objects under iteration, the implementation happens to protect and lock iterated Arrays (case 6) but neglects - doesn't even hold a dummy reference - on a collection (neither on a Dictionary, I've tested that also).

It's the responsibility of the programmer to care about the reference counting, which is not the "spirit" of VBA/VB6 and the architectural motivations behind reference counting.

UPDATE 4: The Variant Comparison Curse

Variants exhibit weird behaviors in many situations. In particular, comparing two Variants of different sub-types yields undefined results. Consider these simple examples:

Sub Test1()
  Dim x, y: x = 30: y = "20"
  Debug.Print x > y               ' False !!
End Sub

Sub Test2()
  Dim x As Long, y: x = 30: y = "20"
  '     ^^^^^^^^
  Debug.Print x > y             ' True
End Sub

Sub Test3()
  Dim x, y As String:  x = 30: y = "20"
  '        ^^^^^^^^^
  Debug.Print x > y             ' True
End Sub

As you can see, when both variables, the number and the string, were declared variants, the comparison is undefined. When at least one of them is explicitly typed, the comparison succeeds.

The same occurs when comparing for equality! For instance, ?2="2" returns True, but if you define two Variant variables, assign them those values and compare them, the comparison fails!

Sub Test4()
  Debug.Print 2 = "2"           ' True

  Dim x, y:  x = 2:  y = "2"
  Debug.Print x = y             ' False !

End Sub
like image 237
A.S.H Avatar asked Feb 20 '17 21:02

A.S.H


1 Answers

Please see edits below!

For Each edits also added below under Edit2

More edits about ForEach and Collections at Edit3

One last edit about ForEach and Collections at Edit4

A final note about iteration behavior at Edit5

Part of the subtlety of this odd behavior in the semantics of variant evaluation when used as a loop control variable or terminating condition.

In a nutshell, when a variant is the terminating value, or the control variable, the terminating value is naturally re-evaluated by the runtime with each iteration. A value type, however, such as an Integer, is pushed directly, and thus not re-evaluated (and its value doesn't change). If the control variable is an Integer, but the terminating value is a Variant, the Variant is coerced to an Integer on the first iteration, and pushed similarly. The same situation arises when the terminating condition is an expression involving a Variant and an Integer - it's coerced to an Integer.

In this example:

Dim v as Variant
v=4
for v= 1 to v
  Debug.print v,
next

The variant v is assigned an integer value of 1, and the loop termination condition is re-evaluated because terminating variable is a variant - the runtime recognizes the presence of the Variant reference and forces re-evaluation with each iteration. As a result, the loop completes because of the in-loop reassignment. Because the variant now has a value of 1, the loop termination condition is satisfied.

Consider this next example:

Dim v as variant
v=4
for v=1 to v-0
   Debug.Print v,
next 

When the terminating condition is an expression, such as "v - 0", the expression is evaluated and coerced to a regular integer, not a variant, and thus its hard value is pushed to the stack at runtime. As a result, the value is not re-evaluated upon each loop iteration.

The other interesting example:

Dim i as Integer
Dim v as variant
v=4
For i = 1 to v
   v=i-1
   Debug.print i,
next

behaves as it does because the control variable is an Integer, and thus the terminating variable is coerced to an integer as well, then pushed to the stack for iteration.

I cannot swear these are the semantics, but I believe the terminating condition or value is simply pushed onto a stack, thus the integer value is pushed, or the Variant's object reference is pushed, thus triggering the re-evaluation when the compiler realizes a variant holds the terminating value. When the variant gets reassigned within the loop, and the value is re-queried as the loop completes, the new value is returned, and the loop terminates.

Sorry if that's a little muddy, but it's kinda late, but I saw this and couldn't help but take a shot at an answer. Hope it makes some sense. Ah, good ol' VBA :)

EDIT:

Found some actual info from the VBA language spec at MS:

The expressions [start-value], [end-value], and [step-increment] are evaluated once, in order, and prior to any of the following computations. If the value of [start-value], [end-value], and [step-increment] are not Let-coercible to Double, error 13 (Type mismatch) is raised immediately. Otherwise, proceed with the following algorithm using the original, uncoerced values.

Execution of the [for-statement] proceeds according to the following algorithm:

  1. If the data value of [step-increment] is zero or a positive number, and the value of [bound-variable-expression] is greater than the value of [end-value], then execution of the [forstatement] immediately completes; otherwise, advance to Step 2.

  2. If the data value of [step-increment] is a negative number, and the value of [bound-variable-expression] is less than the value of [end-value], execution of the [for-statement] immediately completes; otherwise, advance to Step 3.

  3. The [statement-block] is executed. If a [nested-for-statement] is present, it is then executed. Finally, the value of [bound-variable-expression] is added to the value of [step-increment] and Let-assigned back to [bound-variable-expression]. Execution then repeats at step 1.

What I gather from this is that the intent is for the terminating condition value to be evaluated once and once only. If we see evidence that changing that value changes the behavior of the loop from its initial condition, it is almost certainly due to what might be termed informally as accidental re-evaluation because it's a variant. If it's unintentional, we can probably only use anecodtal evidence to predict its behavior.

If as the runtime evaluates a loop's start/end/step values, and pushes the "value" of those expressions onto the stack, a Variant value throws a "byref wrench" into the process. If the runtime does not first recognize the variant, evaluate it, and push that value as the terminating condition, curious behavior (as you are showing) would almost certainly ensue. Exactly how VBA handles variants in this case would be a great task for pcode analysis, as others have suggested.

EDIT2: FOREACH

The VBA spec again provides insight into the evaluation of ForEach loops over collections and arrays:

The expression [collection] is evaluated once prior to any of the >following computations.

  1. If the data value of [collection] is an array:

    If the array has no elements, then execution of the [for-each-statement] immediately completes.

    If the declared type of the array is Object, then the [bound-variable-expression] is Set-assigned to the first element in the >array. Otherwise, the [bound-variable-expression] is Let-assigned to the >first element in the array.

    After [bound-variable-expression] has been set, the [statement-block] >is executed. If a [nested-for-statement] is present, it is then executed.

    Once the [statement-block] and, if present, the [nested-for-statement] >have completed execution, [bound-variable-expression] is Let-assigned to >the next element in the array (or Set-assigned if it is an array of >Object). If and only if there are no more elements in the array, then >execution of the [for-each-statement] immediately completes. Otherwise, >[statement-block] is executed again, followed by [nested-forstatement] if >present, and this step is repeated.

    When the [for-each-statement] has finished executing, the value of >[bound-variable-expression] is the data value of the last element of the >array.

  2. If the data value of [collection] is not an array:

    The data value of [collection] must be an object-reference to an >external object that supports an implementation-defined enumeration >interface. The [bound-variable-expression] is either Let-assigned or >Set-assigned to the first element in [collection] in an >implementation->defined manner.

    After [bound-variable-expression] has been set, the [statement-block] >is executed. If a [nested-for-statement] is present, it is then executed.

    Once the [statement-block] and, if present, the [nested-for-statement] >have completed execution, [bound-variable-expression] is Set-assigned to >the next element in [collection] in an implementation-defined manner. If >there are no more elements in [collection], then execution of the [for-each->statement] immediately completes. Otherwise, [statement-block] is >executed again, followed by [nested-for-statement] if present, and this >step is repeated.

    When the [for-each-statement] has finished executing, the value of >[bound-variable-expression] is the data value of the last element in >[collection].

Using this as a base, I think it becomes clear that a Variant assigned to a variable that then becomes the bound-variable-expression generates the "Array is locked" error in this example:

    Dim v As Variant, vv As Variant
v = Array(1, 1, 1, 1)
i = 1
' Any of the Commented lines below generates the same RT error:
For Each v In v  ' "This array is fixed or temporarily locked"
'For Each vv In v
    'v = 4
    'ReDim Preserve v(LBound(v) To UBound(v))
    If i < UBound(v) Then v(i + 1) = i + 1 ' so we can alter the entries in the array, but not the array itself
    i = i + 1
     Debug.Print vv,            ' 1, 2, 3, 4
Next

Using 'v' as the [bound-variable-expression] creates a Let-assignment back to V that is prevented by the runtime because it is the target of an enumeration underway to support the ForEach loop itself; that is, the runtime locks the variant, thus precluding the loop from assigning a different value to the variant as would necessarily have to occur.

This also applies to the 'Redim Preserve' - resizing or changing the array, thus changing the variant's assignment, is going to violate the lock placed on the enumeration target at the loop's initialization.

With regard to Range-based assignments/iteration, note the separate semantics for non-object elements kicks in; the "external objects" provide an implementation-specific enumeration behavior. An excel Range object has a _Default property that is being called when referenced by the object name only, as in this case, which does not take an implicit lock when used as the iteration target of the ForEach (and thus does not generate the locking error, as it has different semantics than the Variant variety):

Debug.Print vbCrLf & "Case10 range in range",
Set rng = Range("A1:D1") '.Cells.Cells add as many as you want
For Each rng In rng ' (another implicit .Cells here?)
    Debug.Print rng.Column,     ' 1, 2, 3, 4
Next

(The _Default property can be identified by examining the Excel object library within the VBA Object Browser via highlighting the Range object ,right-clicking, and selecting "Show Hidden Members").

EDIT3: Collections

The code involving collections gets interesting and a little hairy :)

Debug.Print vbCrLf & "Case7 obj in col",
Set obj = New Collection: For i = 1 To 4: obj.Add Cells(i, i): Next
For Each obj In obj
    Debug.Print obj.Column,    ' 1 only ?
Next

Debug.Print vbCrLf & "Case8 var in col",
Set v = New Collection: For i = 1 To 4: v.Add Cells(i, i): Next
For Each v In v
    Debug.Print v.column,      ' nothing!
Next

This is where nothing more than a genuine bug has to be considered at play. When I first ran these two samples in the VBA debugger, they ran precisely as the OP offered in the initial question. Then, after a restart of the routine following a few tests, but then restoring the code to its original form (as shown here), the latter behavior arbitrarily started matching that of the object-based predecessor above it! Only after I stopped Excel, and restarted it, did the original behavior of the latter loop (printing nothing), return. There's really no way to explain that other than a compiler bug.

EDIT4 Reproducible behavior with Variants

After noting that I'd done something within the debugger to force the variant-based iteration through a Collection to loop at least once (as it had with the Object version), I finally found a code-reproducible way of changing the behavior

Consider this original code:

Dim v As Variant, vv As Variant

Set v = New Collection: For x = 1 To 4: v.Add Cells(x, x): Next x
'Set vv = v
For Each v In v
   Debug.Print v.Column
Next

This is essentially the OP's original case, and the ForEach loop terminates without a single iteration. Now, uncomment the 'Set vv=v' line, and re-run: now the For Each will iterate one time. I think there's no question that we've found some very (very!) subtle bug in Variant evaluation mechanism in the VB runtime; the arbitrary setting of another 'Variant' equal to the loop variable forces an evaluation that does not take place in the For Each evaluation - and I suspect that's tied to the fact that the Collection is represented within the Variant as a Variant/Object/Collection. Adding this bogus 'set' seems to force the issue and make the loop operate as the Object-based version does.

EDIT5: A final thought about iterations and collections

This will probably be my last edit to this answer, but one thing I had to force myself to be sure I recognized during the observation of odd loop behavior when a variables was used as the 'bound-variable-expression' and the limit expression was that, particularly when it comes to 'Variants', sometimes the behavior is induced by virtue of the iteration changing the contents of the 'bound-variable-expresssion.' That is, if you have:

Dim v as Variant
Dim vv as Variant
Set v = new Collection(): for x = 1 to 4: v.Add Cells(x,x):next
Set vv = v ' placeholder to make the loop "kinda" work
for each v in v
   'do something
Next

it is vital to remember (at least it was for me) to keep in mind that within the For Each, the 'bound-variable-expression' held in 'v' gets changed by virtue of the iteration. That is, when we start the loop, v holds a Collection, and the enumeration begins. But when that enumeration starts, the contents of v are now the product of the enumeration - in this case, a Range object (from the Cell). This behavior can be seen in the debugger, as you can observe 'v' go from Collection to Range; meaning that the next kick in the iteration returns whatever the enumeration context of the Range object would provide, not the 'Collection.'

This has been a great study and I appreciate the feedback. It's helped me understand things even better than I thought. Unless there are more comments or questions on this, I suspect this will be my last edit to the answer.

like image 69
David W Avatar answered Nov 20 '22 17:11

David W