I recently listened to Rich Hickey's interview on Software Engineering Radio. During the interview Rich mentioned that Clojure's collections are implemented as trees. I'm hoping to implement persistent data structures in another language, and would like to understand how sets and Clojure's other persistent data structures are implemented.
What would the tree look like at each point in the following scenario?
Create the set {1 2 3}
Create the union of {1 2 3}
and {4}
Create the difference of {1 2 3 4}
and {1}
I'd like to understand how the three sets generated ({1 2 3}
, {1 2 3 4}
, and {2 3 4}
) share structure, and how "deletions" are handled.
I'd also like to know the maximum number of branches that a node may have. Rich mentioned in the interview that the trees are shallow, so presumably the branching factor is greater than two.
You probably need to read the work of Phil Bagwell. His research into data structures is the base of Clojure, Haskell and Scala persistent data structures.
There is this talk by Phil at Clojure/Conj: http://www.youtube.com/watch?v=K2NYwP90bNs
There are also some papers:
You can also read Purely Functional Data Structures by Chris Okasaki. This blog post talks about the book: http://okasaki.blogspot.com.br/2008/02/ten-years-of-purely-functional-data.html
You should really read Clojure Programming, it covers this in great detail, including pictures. Briefly though, collections are depth first searches through trees. We can show your examples like this:
(def x #{1 2 3})
x
|
| \
|\ 3
1 \
2
(def y (conj x 4))
x y
| / \
| \ 4
|\ 3
1 \
2
(def z (difference y #{1}))
x y
| / \
| \ 4
|\ 3
1/\
z- 2
Note that these are just indicative, I'm not saying that this is exactly the layout Clojure uses internally. It's the gist though.
I like SCdF's drawings and explanations, but if you're looking for more depth you should read the excellent series of articles on Clojure's data structures at Higher-Order. It explains in detail how Clojure's maps work, and Clojure's sets are just a thin layer on top of its maps: #{:a :b}
is implemented as a wrapping around {:a :a, :b :b}
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With