Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the data structure behind Clojure's sets?

I recently listened to Rich Hickey's interview on Software Engineering Radio. During the interview Rich mentioned that Clojure's collections are implemented as trees. I'm hoping to implement persistent data structures in another language, and would like to understand how sets and Clojure's other persistent data structures are implemented.

What would the tree look like at each point in the following scenario?

  1. Create the set {1 2 3}

  2. Create the union of {1 2 3} and {4}

  3. Create the difference of {1 2 3 4} and {1}

I'd like to understand how the three sets generated ({1 2 3}, {1 2 3 4}, and {2 3 4}) share structure, and how "deletions" are handled.

I'd also like to know the maximum number of branches that a node may have. Rich mentioned in the interview that the trees are shallow, so presumably the branching factor is greater than two.

like image 412
davidchambers Avatar asked Apr 29 '13 02:04

davidchambers


3 Answers

You probably need to read the work of Phil Bagwell. His research into data structures is the base of Clojure, Haskell and Scala persistent data structures.

There is this talk by Phil at Clojure/Conj: http://www.youtube.com/watch?v=K2NYwP90bNs

There are also some papers:

  • Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrays
  • Ideal Hash Trees
  • More from Phil here

You can also read Purely Functional Data Structures by Chris Okasaki. This blog post talks about the book: http://okasaki.blogspot.com.br/2008/02/ten-years-of-purely-functional-data.html

like image 136
Rodrigo Taboada Avatar answered Nov 10 '22 11:11

Rodrigo Taboada


You should really read Clojure Programming, it covers this in great detail, including pictures. Briefly though, collections are depth first searches through trees. We can show your examples like this:

(def x #{1 2 3})

x
|
| \
|\ 3
1 \
   2

(def y (conj x 4))

 x  y
 | / \
 | \   4
 |\ 3
 1 \
    2

 (def z (difference y #{1}))

 x  y
 | / \
 | \  4
 |\ 3
 1/\
z-  2

Note that these are just indicative, I'm not saying that this is exactly the layout Clojure uses internally. It's the gist though.

like image 41
SCdF Avatar answered Nov 10 '22 11:11

SCdF


I like SCdF's drawings and explanations, but if you're looking for more depth you should read the excellent series of articles on Clojure's data structures at Higher-Order. It explains in detail how Clojure's maps work, and Clojure's sets are just a thin layer on top of its maps: #{:a :b} is implemented as a wrapping around {:a :a, :b :b}.

like image 8
amalloy Avatar answered Nov 10 '22 09:11

amalloy