What's the most efficient way to represent finite (non-recursive) algebraic type values?

Question

What's the most efficient way to serialize finite (non-recursive) algebraic-data-types which are comprised only of constructors?

e.g.

p = A
  | B q

q = C 
  | D r
  | E

r = F
  | G

Manually enumerating all valid combinations for this trivially small definition is possible:

A      0x00
B C    0x01
B D F  0x02
B D G  0x03
B E    0x04

Is there any broader theory here?
How about if we then add non-constructor types, such as ints, etc.?
How does Haskell represent these in memory (it allows recursion, so pointers/references will probably be necessary)?

yatima2975 · Accepted Answer

There's no completely standard class that does this, but it's pretty easy to make one yourself. I'll sketch one way of doing this:

data P = A | B Q deriving Show
data Q = C | D R | E deriving Show
data R = F | G  deriving Show

class Finite a where
    allValues :: [a]

instance Finite P where
    allValues = [A] ++ map B allValues

instance Finite Q where
    allValues = [C] ++ map D allValues ++ [E]

instance Finite R where
    allValues = [F] ++ [G]

I've written the instances this way to show that it's very easy and mechanical and could be done by a program (e.g. with generic programming or with Template Haskell). You could also add an instance to do some legwork for you, provided the type is Bounded and Enumerable:

instance (Bounded a, Enum a) => Finite a where
    allValues = [minBound..maxBound]

If you now add deriving (Bounded, Show) to R, that's one less instance to write!

Anyway, now we can evaluate allValues :: [P] and get back [A,B C,B (D F),B (D G),B E] - which you can then zip with [0..] to get your encoding and so on.

But surely this has been done before! I'm not using serialization much (if ever), but a quick search shows that the binary package and the binary-derive package can do something similar for you, without having to write the instances yourself. I would see if those do what you want first.

sclv · Answer

As for in-memory haskell representations, we can't represent things fully packed typically since the structures are lazy, and that means we need an indirection at each level. That said, unpacking will let you crush these things together. But, as far as I know, it won't pack bits from nested constructors into the same word.

There is a pointer-tagging optimization that shoves some information about the constructor in the pointer that directs to it: http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/PointerTagging

For more on unpacking see this: http://www.haskell.org/haskellwiki/Performance/Data_types#Unpacking_strict_fields

What's the most efficient way to represent finite (non-recursive) algebraic type values?

Tags:

serialization

haskell

algebraic-data-types

fadedbee

2 Answers

yatima2975

sclv

Recent Activity

Donate For Us

What's the most efficient way to represent finite (non-recursive) algebraic type values?

Tags:

serialization

haskell

algebraic-data-types

fadedbee

2 Answers

yatima2975

sclv

Related questions

Recent Activity

Donate For Us