What is the idiomatic way to represent algebraic datatype constructors in Scheme (R6RS)?

Tags:

I'm wondering what is the best way to translate Haskell-like datatypes to Scheme. My current plan is to represent constructors with vectors, with the first element being a label representing the variant. So, for example, the following Haskell program:

data Bits       = O Bits | I Bits | E deriving Show
data Nat        = S Nat  | Z          deriving Show
inc (O pred)    = I pred
inc (I pred)    = O (inc pred)
inc E           = E
dup (S pred)    = let (x,y) = dup pred in (S x, S y)
dup Z           = (Z, Z)
bus Z        bs = inc bs
bus (S pred) bs = let (x,y) = (pred,pred) in (bus pred (bus pred bs))
o32             = (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O E))))))))))))))))))))))))))))))))
n26             = (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S Z))))))))))))))))))))))))))
main            = print (bus n26 o32)

Would be translated as:

(define (O pred)   (vector 'O pred))
(define (I pred)   (vector 'I pred))
(define E          (vector 'E))
(define (S pred)   (vector 'S pred))
(define Z          (vector 'Z))
(define (Inc bits) (case (vector-ref bits 0) ('O (I (vector-ref bits 1))) ('I (O (Inc (vector-ref bits 1)))) ('E E)))
(define (Dup val)  (case (vector-ref val 0) ('S (let ((xy (Dup (vector-ref val 1)))) (cons (S (car xy)) (S (cdr xy))))) ('Z (cons Z Z))))
(define (Bus n bs) (case (vector-ref n 0) ('Z (Inc bs)) ('S (let ((xy (Dup (vector-ref n 1)))) (Bus (car xy) (Bus (cdr xy) bs))))))
(define O32        (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O E)))))))))))))))))))))))))))))))))
(define N26        (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S Z)))))))))))))))))))))))))))
(display (Bus N26 O32))

To my surprise, this actually performs quite well (Scheme is faster than Haskell here). But I wonder if this is the best way to do it? Is this reasonable, or is there some more "idiomatic" translation, which would be expected to perform better?

417

asked Apr 11 '21 16:04

MaiaVictor

1 Answers

Broadly, I think you have two approaches: a “positive” encoding like you’ve done here, where you have vectors denoting products containing (dynamic) tags to distinguish sums, and a “negative” encoding (Böhm–Berarducci / visitor) where you represent data types by the recursion scheme for consuming them.

An example of a BB-encoding version in Haskell:

{-# Language RankNTypes #-}

-- The type of /folds/ that consume bits.
newtype Bits = Bits
  { matchBits
    :: forall r.  -- Caller-specified result type.
       (r -> r)   -- O: 1 recursive field
    -> (r -> r)   -- I: 1 recursive field
    -> r          -- E: no fields; could be ‘() -> r’
    -> r
  }

-- The type of one /recursive unrolling/ of bits.
newtype Bits' = Bits'
  { matchBits'
    :: forall r.    -- Also any result type.
       (Bits -> r)  -- But note! ‘Bits’ instead of ‘r’.
    -> (Bits -> r)
    -> r
    -> r
  }

-- Basic constructors retain their types.
mkI, mkO :: Bits -> Bits
mkE :: Bits

mkI', mkO' :: Bits' -> Bits'
mkE' :: Bits'

-- Constructor functions perform visitor dispatch.
-- This is where the recursion happens in ‘matchBits’.
mkO pred = Bits $ \  o  i e -> o (matchBits pred o i e)
mkI pred = Bits $ \  o  i e -> i (matchBits pred o i e)
mkE      = Bits $ \ _o _i e -> e

-- General recursive dispatch is similar.
mkO' pred = Bits' $ \  o  i e -> o (matchBits' pred mkO mkI mkE)
mkI' pred = Bits' $ \  o  i e -> i (matchBits' pred mkO mkI mkE)
mkE'      = Bits' $ \ _o _i e -> e

-- Recursive deconstruction, used below.
recurBits :: Bits -> Bits'
recurBits bits = matchBits bits mkO' mkI' mkE'

-- We only need a fold for nats here.
newtype Nat = Nat
  { matchNat
    :: forall r.  -- Result type.
       (r -> r)   -- S: 1 recursive field
    -> r          -- Z: no fields; also could be ‘() -> r’
    -> r
  }

mkS :: Nat -> Nat
mkZ :: Nat

mkS pred = Nat $ \  s z -> s (matchNat pred s z)
mkZ      = Nat $ \ _s z -> z

-- Case branches with ‘matchBits’ receive the /result/
-- of the recursive call on a recursive field. So this
-- is /not/ what we want:
--
-- > inc bits = matchBits bits mkI (mkO . inc) mkE
--
-- Instead, we want the field itself, so we must use
-- the recursive ‘matchBits'’.
inc :: Bits -> Bits
inc bits = matchBits' (recurBits bits) mkI (mkO . inc) mkE

-- Or: ‘dup nat = matchNat nat (mkS *** mkS) (mkZ, mkZ)’
-- Or: ‘dup nat = (nat, nat)’ = ‘dup = join (,)’
dup :: Nat -> (Nat, Nat)
dup nat = matchNat nat
  (\ (x, y) -> (mkS x, mkS y))  -- S
  (mkZ, mkZ)                    -- Z

-- NB: think of as ‘Nat -> (Bits -> Bits)’.
bus :: Nat -> Bits -> Bits
bus n = matchNat n
  (\ f -> f . f)  -- S
  inc             -- Z

And you can translate that more or less directly into Scheme. Here’s an untested and probably incorrect sketch of a translation, just to illustrate a starting point:

(define (O pred)  (lambda (o i e) (o (pred o i e))))
(define (I pred)  (lambda (o i e) (i (pred o i e))))
(define E         (lambda (o i e) (e)))

(define (O_ pred) (lambda (o i e) (o (pred O I E))))
(define (I_ pred) (lambda (o i e) (i (pred O I E))))
(define E_        (lambda (o i e) (e)))

(define (S pred)  (lambda (s z) (s (pred s z))))
(define Z         (lambda (s z) (z)))

(define (recurBits bits) (bits O_ I_ E_))

(define (Inc bits)
  ((recurBits bits)
     I
     (lambda (pred) (O (Inc pred)))
     E))

(define (Dup val)
  (val (lambda (p)
         (let ((x (car p))
               (y (cdr p)))
           (cons (S x) (S y))))
       (cons Z Z)))

(define (Bus n bs)
  ((Bus_ n) bs))
(define (Bus_ n)
  (n (lambda (pred) (lambda (bs) (pred (pred bs))))
     inc))

You may need to add some explicit parameters or additional lambdas in a couple places to handle differences in partial application and lazy evaluation, like the awkwardness with Bus_.

In general, though, I’d expect this method to have comparable or better performance characteristics in many applications. Instead of vectors, it’s relying on closures, which can likely be compiled better because the language implementation knows more about their structure. Instead of dynamically dispatching on values, it selects among functions to (tail-)call, avoiding the construction of some values.

I also found Oleg Kiselyov’s notes on BB-encoding a helpful resource when learning about this technique in Haskell.

156

answered Nov 01 '22 07:11

Jon Purdy

Related questions
                            
                                Haskell: how to separate interface from implementation
                            
                                Type safe lookup on heterogeneous lists in Haskell
                            
                                Control.Arrow: Why "let (a,b) = (first,second)" fails?
                            
                                Simplest package for web animation with Haskell
                            
                                GADT type argument not being used for typeclass resolution
                            
                                'Kind' of confused about forall in type-indexed GADTs
                            
                                Hide a constructor but not the type on import
                            
                                Is there a way to make Haddock render per-argument docs for type class methods?
                            
                                Test that a Haskell function has a space leak
                            
                                `coerce` and instantiation of type variables
                            
                                Doing dependency injection using monad stacks
                            
                                With Nix, how can I specify Haskell dependencies with profiling enabled?
                            
                                What is the dual of a prism or an affine traversal?
                            
                                Is it a good or a bad thing that a suite of quickcheck tests match the implementations?
                            
                                Different space behaviour when compiling from source than when linking to a package
                            
                                How do i check if a string can be parsed to a certain type in Haskell?
                            
                                Why does mutual yielding make ArrowApply and Monads equivalent, unlike Arrow and Applicative?
                            
                                Is every Alternative Monad Filterable?
                            
                                How to replace ImplicitParams with the 'reflection' package?
                            
                                How to require functional dependencies in kind signature?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the idiomatic way to represent algebraic datatype constructors in Scheme (R6RS)?

Tags:

idioms

compilation

haskell

algebraic-data-types

scheme

MaiaVictor

People also ask

1 Answers

Jon Purdy

Recent Activity

Donate For Us