Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When exactly do I need "indirect" with writing recursive enums?

Tags:

enums

swift

I'm surprised to find out that this compiles:

enum Foo {
    case one([Foo])
    // or
    // case one([String: Foo])
}

I would have expected the compiler to tell me to add indirect to the enum cases, as Foo contains more Foos. Arrays and Dictionarys are all value types, so there are no indirection here. To me, this is structurally similar to this:

enum Foo {
    case one(Bar<Foo>)
}

// arrays and dictionaries are just a struct with either something or nothing in it
// right?
struct Bar<T> {
    let t: T?
}

... which does require me to add indirect to the enum.

Then I tried tuples:

enum Foo {
    case one((Foo, Foo))
}

and this is considered a recursive enum too. I thought arrays and tuples have the same memory layout, and that is why you can convert arrays to tuples using withUnsafeBytes and then binding the memory...

What rules does this follow? Are there actually something special about arrays and dictionaries? If so, are there other such "special" types that looks like it should require indirect, but actually doesn't?

The Swift Guide is not very helpful. It just says:

A recursive enumeration is an enumeration that has another instance of the enumeration as the associated value for one or more of the enumeration cases. You indicate that an enumeration case is recursive by writing indirect before it, which tells the compiler to insert the necessary layer of indirection.

What "has another instance of the enumeration" means is rather vague. It is even possible to interpret it to mean that case one(Bar<Foo>) is not recursive - the case only has an instance of Bar<Foo>, not Foo.

like image 378
Sweeper Avatar asked Dec 17 '21 11:12

Sweeper


2 Answers

I think part of the confusion stems from this assumption:

I thought arrays and tuples have the same memory layout, and that is why you can convert arrays to tuples using withUnsafeBytes and then binding the memory...

Arrays and tuples don't have the same memory layout:

  • Array<T> is a fixed-size struct with a pointer to a buffer which holds the array elements contiguously* in memory
    • Contiguity is promised only in the case of native Swift arrays [not bridged from Objective-C]. NSArray instances do not guarantee that their underlying storage is contiguous, but in the end this does not have an effect on the code below.
  • Tuples are fixed-size buffers of elements held contiguously in memory

The key thing is that the size of an Array<T> does not change with the number of elements held (its size is simply the size of a pointer to the buffer), while a tuple does. The tuple is more equivalent to the buffer the array holds, and not the array itself.

Array<T>.withUnsafeBytes calls Array<T>.withUnsafeBufferPointer, which returns the pointer to the buffer, not to the array itself. *(In the case of a non-contiguous bridged NSArray, _ArrayBuffer.withUnsafeBufferPointer has to create a temporary contiguous copy of its contents in order to return a valid buffer pointer to you.)


When laying out memory for types, the compiler needs to know how large the type is. Given the above, an Array<Foo> is statically known to be fixed in size: the size of one pointer (to a buffer elsewhere in memory).

Given

enum Foo {
    case one((Foo, Foo))
}

in order to lay out the size of Foo, you need to figure out the maximum size of all of its cases. It has only the single case, so it would be the size of that case.

Figuring out the size of one requires figuring out the size of its associated value, and the size of a tuple of elements is the sum of the size of the elements themselves (taking into account padding and alignment, but we don't really care about that here).

Thus, the size of Foo is the size of one, which is the size of (Foo, Foo) laid out in memory. So, what is the size of (Foo, Foo)? Well, it's the size of Foo + the size of Foo... each of which is the size of Foo + the size of Foo... each of which is the size of Foo + the size of Foo...

Where Array<Foo> had a way out (Array<T> is the same size regardless of T), we're stuck in an infinite loop with no base case.

indirect is the keyword required to break out of the recursion and give this infinite reference a base case. It inserts an implicit pointer by making a given case the fixed size of a pointer, regardless of what it contains or points to. That makes the size of one fixed, which allows Foo to have a fixed size.

indirect is less about Foo referring to Foo in any way, and more about allowing an enum case to potentially contain itself indirectly (because direct containment would lead to an infinite loop).


As an aside, this is also why a struct cannot contain a direct instance of itself:

struct Foo {
    let foo: Foo // error: Value type 'Foo' cannot have a stored property that recursively contains it
}

would lead to infinite recursion, while

struct Foo {
    let foo: UnsafePointer<Foo>
}

is fine.

structs don't support the indirect keyword (at least in a struct, you have more direct control over storage and layout), but there have been pitches for adding support for this on the Swift forums.

like image 104
Itai Ferber Avatar answered Nov 15 '22 06:11

Itai Ferber


Indirect on enum cases can be used to tell the compiler to store the associated value of that enum as a pointer. Reference.

This is necessary when the compiler couldn't figure out the memory layout of the associated value without using a pointer, such as in the case of recursive enums.

The reason why you need to make the below case indirect is because by using a generic struct that holds your enum as an associated value on a case of the enum itself, you end up with a recursive enum, since you can nest Foo and Bar in each other infinitely many times.

enum MaybeRecursive {
    case array([MaybeRecursive])
    case dict([String: MaybeRecursive])
    indirect case genericStruct(GenericStruct<MaybeRecursive>)
}

struct GenericStruct<T> {
    let t: T?
}

// if you replaced `.array` with `.genericStruct` again, you could keep going forever
let associatedValue = GenericStruct(t: MaybeRecursive.genericStruct(.init(t: .array([])))) 

MaybeRecursive.genericStruct(associatedValue)

As for the Array vs Tuple difference, Itai Ferber has expertly explained that part in their own answer.

like image 35
Dávid Pásztor Avatar answered Nov 15 '22 07:11

Dávid Pásztor