I don't get how can something as a Set be immutable and still have an acceptable performance.
From what I've read in F# Sets internally use Red Black Trees as their implementation. If each time we want to add something new to a Red Black Tree we have to basically recreate it, how can it have ever good performance? What am I missing here?
Although I am asking this for F#'s Sets, I think this is as relevant in any other language which has or uses immutable data structures.
Thanks
Slower/Faster At this kind of very technical conceptual level, immutability could only makes things slower. Hardware does best when it's not sporadically allocating memory and can just modify existing memory instead (why we have concepts like temporal locality).
Immutable data structures provides referential transparency which makes it easier to reason about our program locally. Another way to think about it is that every time we execute a pure (referentially transparent) function with the same input, we get the same output.
One of the advantages of immutability is that you can optimize your application by making use of reference and value equality. This makes it easy to identify if anything has changed. You can consider the example of state change in the React component.
The datatypes like int, float, bool, str, tuple, and Unicode are immutable. Datatypes like list, set, dict are mutable.
If you design a data structure with only a few attributes based on primitive or other immutable types, try immutability first. If you want to design a data type where arrays with large (or undefined) size, random access and changing contents are involved, use mutability. For situations between these two extremes, use your judgement. But YMMV.
Probably the most performance-critical uses of immutable data structures will take on this nature of modifying chunky pieces of data and making them unique in the process, while shallow copying unmodified pieces.
In JavaScript, embracing immutability in an efficient way requires a third party library like Immutable.js that provides an efficient implementation of persistent data structures. In most programming languages, there exists libraries that provide an efficient implementation of persistent data structures.
There seems to be a recent trend in JavaScript towards treating data structures as immutable. For example, if you need to change a single property of an object, better to just create a whole new object with the new property, and just copy over all the other properties from the old object, and let the old object be garbage collected.
Almost all immutable collections are some form of balanced tree. To create a new tree, you have to reallocate nodes on the path from the change (insert, remove, "update") to the root. As long as the tree is balanced this takes logarithmic time. If you have something like a 2-3-4 tree (similar to red-black trees) with expected outdegree three, you can handle a million elements using only 10 allocations.
And in languages where data structures are expected to be pure, they make sure allocation is fast. Allocating a four-element node is going to cost a compare, an increment, and four stores. And in many cases you can amortize the cost of a compare over several allocations.
If you want to know more about how these structures work, an excellent source is Purely Functional Data Structures by Chris Okasaki.
You do not have to recreate the whole tree. Many of the branches will stay the same and can be 'reused'. As a simple example, if the new node needs to be added to a leaf in the current tree, then only the parents of that node needs to be cloned and given new branches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With