What's the most efficient way to serialize finite (non-recursive) algebraic-data-types which are comprised only of constructors?
e.g.
p = A
| B q
q = C
| D r
| E
r = F
| G
Manually enumerating all valid combinations for this trivially small definition is possible:
A 0x00
B C 0x01
B D F 0x02
B D G 0x03
B E 0x04
Is there any broader theory here?
How about if we then add non-constructor types, such as ints, etc.?
How does Haskell represent these in memory (it allows recursion, so pointers/references will probably be necessary)?
There's no completely standard class that does this, but it's pretty easy to make one yourself. I'll sketch one way of doing this:
data P = A | B Q deriving Show
data Q = C | D R | E deriving Show
data R = F | G deriving Show
class Finite a where
allValues :: [a]
instance Finite P where
allValues = [A] ++ map B allValues
instance Finite Q where
allValues = [C] ++ map D allValues ++ [E]
instance Finite R where
allValues = [F] ++ [G]
I've written the instances this way to show that it's very easy and mechanical and could be done by a program (e.g. with generic programming or with Template Haskell). You could also add an instance to do some legwork for you, provided the type is Bounded
and Enum
erable:
instance (Bounded a, Enum a) => Finite a where
allValues = [minBound..maxBound]
If you now add deriving (Bounded, Show)
to R
, that's one less instance to write!
Anyway, now we can evaluate allValues :: [P]
and get back [A,B C,B (D F),B (D G),B E]
- which you can then zip
with [0..]
to get your encoding and so on.
But surely this has been done before! I'm not using serialization much (if ever), but a quick search shows that the binary package and the binary-derive package can do something similar for you, without having to write the instances yourself. I would see if those do what you want first.
As for in-memory haskell representations, we can't represent things fully packed typically since the structures are lazy, and that means we need an indirection at each level. That said, unpacking will let you crush these things together. But, as far as I know, it won't pack bits from nested constructors into the same word.
There is a pointer-tagging optimization that shoves some information about the constructor in the pointer that directs to it: http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/HaskellExecution/PointerTagging
For more on unpacking see this: http://www.haskell.org/haskellwiki/Performance/Data_types#Unpacking_strict_fields
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With