Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the most efficient way to represent finite (non-recursive) algebraic type values?

What's the most efficient way to serialize finite (non-recursive) algebraic-data-types which are comprised only of constructors?

e.g.

p = A
  | B q

q = C 
  | D r
  | E

r = F
  | G

Manually enumerating all valid combinations for this trivially small definition is possible:

A      0x00
B C    0x01
B D F  0x02
B D G  0x03
B E    0x04
  • Is there any broader theory here?

  • How about if we then add non-constructor types, such as ints, etc.?

  • How does Haskell represent these in memory (it allows recursion, so pointers/references will probably be necessary)?

like image 396
fadedbee Avatar asked Mar 22 '13 14:03

fadedbee


2 Answers

There's no completely standard class that does this, but it's pretty easy to make one yourself. I'll sketch one way of doing this:

data P = A | B Q deriving Show
data Q = C | D R | E deriving Show
data R = F | G  deriving Show

class Finite a where
    allValues :: [a]

instance Finite P where
    allValues = [A] ++ map B allValues

instance Finite Q where
    allValues = [C] ++ map D allValues ++ [E]

instance Finite R where
    allValues = [F] ++ [G]

I've written the instances this way to show that it's very easy and mechanical and could be done by a program (e.g. with generic programming or with Template Haskell). You could also add an instance to do some legwork for you, provided the type is Bounded and Enumerable:

instance (Bounded a, Enum a) => Finite a where
    allValues = [minBound..maxBound]

If you now add deriving (Bounded, Show) to R, that's one less instance to write!

Anyway, now we can evaluate allValues :: [P] and get back [A,B C,B (D F),B (D G),B E] - which you can then zip with [0..] to get your encoding and so on.


But surely this has been done before! I'm not using serialization much (if ever), but a quick search shows that the binary package and the binary-derive package can do something similar for you, without having to write the instances yourself. I would see if those do what you want first.

like image 115
yatima2975 Avatar answered Nov 16 '22 00:11

yatima2975


As for in-memory haskell representations, we can't represent things fully packed typically since the structures are lazy, and that means we need an indirection at each level. That said, unpacking will let you crush these things together. But, as far as I know, it won't pack bits from nested constructors into the same word.

There is a pointer-tagging optimization that shoves some information about the constructor in the pointer that directs to it: http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/PointerTagging

For more on unpacking see this: http://www.haskell.org/haskellwiki/Performance/Data_types#Unpacking_strict_fields

like image 34
sclv Avatar answered Nov 16 '22 01:11

sclv