Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the most efficient way to represent red-black trees?

Tags:

haskell

Okasaki uses (essentially)

data Color = R | B
data RB a = L | T {-# UNPACK #-}!Color !(RB a) !a !(RB a)

I know that in C, the color is typically handled in a more fiddly way to save space, doing something like making the low bit of a pointer represent color (I think usually the pointer to a node encodes its color, but it would also be possible to mimic Okasaki's structure by making the left or right pointer from a node represent its color).

Obviously, such bit-fiddling is impossible in Haskell. How, then, can the nodes be represented most efficiently in Haskell?

data RB' a = L | B !(RB a) !a !(RB a) | R !(RB a) !a !(RB a)

seems likely to be reasonably memory efficient, but it also seems likely to make pattern matching rather verbose.

like image 422
dfeuer Avatar asked Nov 02 '22 04:11

dfeuer


1 Answers

Only single constructor data types can be unpacked, and there is no way to have a "generic unpack" for polymorphic constructors. Your single-type construction of a tree, below, will actually be stored using pointer tagging. It has 3 constructors, one of which is the empty and will not contain any dereferences. As an aside, there seems to be an opportunity for GHC to optimize, but I don't think it does. Theoretically data Foo = A | B | C | ... Z could be represented as 26 distinct, reserved pointer values. I digress, however.

data RB' a = L | B !(RB a) !a !(RB a) | R !(RB a) !a !(RB a)

The above type will be represented as a tagged pointer, and pattern matching will be very efficient. I think this is what you were referring to when you mentioned memory. If you know the value of a, you could use associated data types (data families) to write more efficient constructors. A wonderful resource on this is Don Stewart's article Self-optimizing data structures: using types to make lists faster.

Data families would allow you to express something akin to this:

class AdaptRedBlackTree a where
  data RBTree a

  empty :: a
  insert :: a -> Tree a -> Tree a
  ...

instance RedBlackTree Int where
  data RBTree Int = RBEmptyInt 
                  | LInt (RBTree Int)
                         {-# UNPACK #-} Int
                         (RBTree Int) 
                  | RInt (RBTree Int)
                         {-# UNPACK #-} Int
                         (RBTree Int)

Unfortunately, further unpacking will be difficult, but at least you can avoid dereferences on the Int values.

like image 173
Aaron Friel Avatar answered Nov 09 '22 07:11

Aaron Friel