Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When does the copying take place for swift value types

Tags:

swift

In Swift, when you pass a value type, say an Array to a function. A copy of the array is made for the function to use.

However the documentation at https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/ClassesAndStructures.html#//apple_ref/doc/uid/TP40014097-CH13-XID_134 also says:

The description above refers to the “copying” of strings, arrays, and dictionaries. The behavior you see in your code will always be as if a copy took place. However, Swift only performs an actual copy behind the scenes when it is absolutely necessary to do so. Swift manages all value copying to ensure optimal performance, and you should not avoid assignment to try to preempt this optimization.

So does it mean that the copying actually only takes placed when the passed value type is modified?

Is there a way to demonstrate that this is actually the underlying behavior?

Why this is important? If I create a large immutable array and want to pass it in from function to function, I certainly do not want to keep making copies of it. Should I just use NSArrray in this case or would the Swift Array work fine as long as I do not try to manipulate the passed in Array?

Now as long as I do not explicitly make the variables in the function editable by using var or inout, then the function can not modify the array anyway. So does it still make a copy? Granted that another thread can modify the original array elsewhere (only if it is mutable), making a copy at the moment the function is called necessary (but only if the array passed in is mutable). So if the original array is immutable and the function is not using var or inout, there is no point in Swift creating a copy. Right? So what does Apple mean by the phrase above?

like image 866
Barka Avatar asked Oct 27 '14 18:10

Barka


2 Answers

TL;DR:

So does it mean that the copying actually only takes placed when the passed value type is modified?

Yes!

Is there a way to demonstrate that this is actually the underlying behavior?

See the first example in the section on the copy-on-write optimization.

Should I just use NSArrray in this case or would the Swift Array work fine as long as I do not try to manipulate the passed in Array?

If you pass your array as inout, then you'll have a pass-by-reference semantics, hence obviously avoiding unnecessary copies. If you pass your array as a normal parameter, then the copy-on-write optimization will kick in and you shouldn't notice any performance drop while still benefiting from more type safety that what you'd get with a NSArray.

Now as long as I do not explicitly make the variables in the function editable by using var or inout, then the function can not modify the array anyway. So does it still make a copy?

You will get a "copy", in the abstract sense. In reality, the underlying storage will be shared, thanks to the copy-on-write mechanism, hence avoiding unnecessary copies.

If the original array is immutable and the function is not using var or inout, there is no point in Swift creating a copy. Right?

Exactly, hence the copy-on-write mechanism.

So what does Apple mean by the phrase above?

Essentially, Apple means that you shouldn't worry about the "cost" of copying value types, as Swift optimizes it for you behind the scene.

Instead, you should just think about the semantics of value types, which is that get a copy as soon as you assign or use them as parameters. What's actually generated by Swift's compiler is the Swift's compiler business.

Value types semantics

Swift does indeed treat arrays as value types (as opposed to reference types), along with structures, enumerations and most other built-in types (i.e. those that are part of the standard library and not Foundation). At the memory level, these types are actually immutable plain old data objects (POD), which enables interesting optimizations. Indeed, they are typically allocated on the stack rather than the heap [1], (https://en.wikipedia.org/wiki/Stack-based_memory_allocation). This allows the CPU to very efficiently manage them, and to automatically deallocate their memory as soon as the function exits [2], without the need for any garbage collection strategy.

Values are copied whenever assigned or passed as a function. This semantics has various advantages, such as avoiding the creation of unintended aliases, but also as making it easier for the compiler to guarantee the lifetime of values stored in a another object or captured by a closure. We can think about how hard it can be to manage good old C pointers to understand why.

One may think it's an ill-conceived strategy, as it involves copying every single time a variable is assigned or a function is called. But as counterintuitive it may be, copying small types is usually quite cheap if not cheaper than passing a reference. After all, a pointer is usually the same size as an integer...

Concerns are however legitimate for large collections (i.e. arrays, sets and dictionaries), and very large structures to a lesser extent [3]. But the compiler has has a trick to handle these, namely copy-on-write (see later).

What about mutating

Structures can define mutating methods, which are allowed to mutate the fields of the structure. This doesn't contradict the fact that value types are nothing more than immutable PODs, as in fact calling a mutating method is merely a huge syntactic sugar for reassigning a variable to a brand new value that's identical to the previous ones, except for the fields that were mutated. The following example illustrates this semantical equivalence:

struct S {
  var foo: Int
  var bar: Int
  mutating func modify() {
    foo = bar
  }
}

var s1 = S(foo: 0, bar: 10)
s1.modify()

// The two lines above do the same as the two lines below:
var s2 = S(foo: 0, bar: 10)
s2 = S(foo: s2.bar, bar: s2.bar)

Reference types semantics

Unlike value types, reference types are essentially pointers to the heap at the memory level. Their semantics is closer to what we would get in reference-based languages, such as Java, Python or Javascript. This means they do not get copied when assigned or passed to a function, their address is. Because the CPU is no longer able to manage the memory of these objects automatically, Swift uses a reference counter to handle garbage collection behind the scenes (https://en.wikipedia.org/wiki/Reference_counting).

Such semantics has the obvious advantage to avoid copies, as everything is assigned or passed by reference. The drawback is the danger of unintended aliases, as in almost any other reference-based language.

What about inout

An inout parameter is nothing more than a read-write pointer to the expected type. In the case of value types, it means the function won't get a copy of the value, but a pointer to such values, so mutations inside the function will affect the value parameter (hence the inout keyword). In other terms, this gives value types parameters a reference semantics in the context of the function:

func f(x: inout [Int]) {
  x.append(12)
}

var a = [0]
f(x: &a)

// Prints '[0, 12]'
print(a)

In the case of reference types, it will make the reference itself mutable, pretty much as if the passed argument was a the address of the address of the object:

func f(x: inout NSArray) {
  x = [12]
}

var a: NSArray = [0]
f(x: &a)

// Prints '(12)'
print(a)

Copy-on-write

Copy-on-write (https://en.wikipedia.org/wiki/Copy-on-write) is an optimization technique that can avoid unnecessary copies of mutable variables, which is implemented on all Swift's built-in collections (i.e. array, sets and dictionaries). When you assign an array (or pass it to a function), Swift doesn't make a copy of the said array and actually uses a reference instead. The copy will take place as soon as the your second array is mutated. This behavior can be demonstrated with the following snippet (Swift 4.1):

let array1 = [1, 2, 3]
var array2 = array1

// Will print the same address twice.
array1.withUnsafeBytes { print($0.baseAddress!) }
array2.withUnsafeBytes { print($0.baseAddress!) }

array2[0] = 1

// Will print a different address.
array2.withUnsafeBytes { print($0.baseAddress!) }

Indeed, array2 doesn't get a copy of array1 immediately, as shown by the fact it points to the same address. Instead, the copy is triggered by the mutation of array2.

This optimization also happens deeper in the structure, meaning that if for instance your collection is made of other collections, the latter will also benefit from the copy-on-write mechanism, as demonstrated by the following snippet (Swift 4.1):

var array1 = [[1, 2], [3, 4]]
var array2 = array1

// Will print the same address twice.
array1[1].withUnsafeBytes { print($0.baseAddress!) }
array2[1].withUnsafeBytes { print($0.baseAddress!) }

array2[0] = []

// Will print the same address as before.
array2[1].withUnsafeBytes { print($0.baseAddress!) }

Replicating copy-on-write

It is in fact rather easy to implement the copy-on-write mechanism in Swift, as some of the its reference counter API is exposed to the user. The trick consists of wrapping a reference (e.g. a class instance) within a structure, and to check whether that reference is uniquely referenced before mutating it. When that's the case, the wrapped value can be safely mutated, otherwise it should be copied:

final class Wrapped<T> {
  init(value: T) { self.value = value }
  var value: T
}

struct CopyOnWrite<T> {
  init(value: T) { self.wrapped = Wrapped(value: value) }
  var wrapped: Wrapped<T>
  var value: T {
    get { return wrapped.value }
    set {
      if isKnownUniquelyReferenced(&wrapped) {
        wrapped.value = newValue
      } else {
        wrapped = Wrapped(value: newValue)
      }
    }
  }
}

var a = CopyOnWrite(value: SomeLargeObject())

// This line doesn't copy anything.
var b = a

However, there is an import caveat here! Reading the documentation for isKnownUniquelyReferenced we get this warning:

If the instance passed as object is being accessed by multiple threads simultaneously, this function may still return true. Therefore, you must only call this function from mutating methods with appropriate thread synchronization.

This means the implementation presented above isn't thread safe, as we may encounter situations where it'd wrongly assumes the wrapped object can be safely mutated, while in fact such mutation would break invariant in another thread. Yet this doesn't mean Swift's copy-on-write is inherently flawed in multithreaded programs. The key is to understand what "accessed by multiple threads simultaneously" really means. In our example, this would happen if the same instance of CopyOnWrite was shared across multiple threads, for instance as part of a shared global variable. The wrapped object would then have a thread safe copy-on-write semantics, but the instance holding it would be subject to data race. The reason is that Swift must establish unique ownership to properly evaluate isKnownUniquelyReferenced [4], which it can't do if the owner of the instance is itself shared across multiple threads.

Value types and multithreading

It is Swift's intention to alleviate the burden of the programmer when dealing with multithreaded environments, as stated on Apple's blog (https://developer.apple.com/swift/blog/?id=10):

One of the primary reasons to choose value types over reference types is the ability to more easily reason about your code. If you always get a unique, copied instance, you can trust that no other part of your app is changing the data under the covers. This is especially helpful in multi-threaded environments where a different thread could alter your data out from under you. This can create nasty bugs that are extremely hard to debug.

Ultimately, the copy-on-write mechanism is a resource management optimization that, like any other optimization technique, one shouldn't think about when writing code [5]. Instead, one should think in more abstract terms and consider values to be effectively copied when assigned or passed as arguments.


[1] This holds only for values used as local variables. Values used as fields of a reference type (e.g. a class) are also stored in the heap.

[2] One could get confirmation of that by checking the LLVM byte code that's produced when dealing with value types rather than reference types, but the Swift compiler being very eager to perform constant propagation, building a minimal example is a bit tricky.

[3] Swift doesn't allow structures to reference themselves, as the compiler would be unable to compute the size of such type statically. Therefore, it is not very realistic to think of a structure that is so large that copying it would become a legitimate concern.

[4] This is, by the way, the reason why isKnownUniquelyReferenced accepts an inout parameter, as it's currently Swift's way to establish ownership.

[5] Although passing copies of value-type instances should be safe, there's a open issue that suggests some problems with the current implementation (https://bugs.swift.org/browse/SR-6543).

like image 143
Alvae Avatar answered Nov 16 '22 04:11

Alvae


I don't know if that's the same for every value type in Swift, but for Arrays I'm pretty sure it's a copy-on-write, so it doesn't copy it unless you modify it, and as you said if you pass it around as a constant you don't run that risk anyway.

p.s. In Swift 1.2 there are new APIs you can use to implement copy-on-write on your own value-types too

like image 37
DeFrenZ Avatar answered Nov 16 '22 03:11

DeFrenZ