Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the idiomatic way to represent algebraic datatype constructors in Scheme (R6RS)?

I'm wondering what is the best way to translate Haskell-like datatypes to Scheme. My current plan is to represent constructors with vectors, with the first element being a label representing the variant. So, for example, the following Haskell program:

data Bits       = O Bits | I Bits | E deriving Show
data Nat        = S Nat  | Z          deriving Show
inc (O pred)    = I pred
inc (I pred)    = O (inc pred)
inc E           = E
dup (S pred)    = let (x,y) = dup pred in (S x, S y)
dup Z           = (Z, Z)
bus Z        bs = inc bs
bus (S pred) bs = let (x,y) = (pred,pred) in (bus pred (bus pred bs))
o32             = (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O E))))))))))))))))))))))))))))))))
n26             = (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S Z))))))))))))))))))))))))))
main            = print (bus n26 o32)

Would be translated as:

(define (O pred)   (vector 'O pred))
(define (I pred)   (vector 'I pred))
(define E          (vector 'E))
(define (S pred)   (vector 'S pred))
(define Z          (vector 'Z))
(define (Inc bits) (case (vector-ref bits 0) ('O (I (vector-ref bits 1))) ('I (O (Inc (vector-ref bits 1)))) ('E E)))
(define (Dup val)  (case (vector-ref val 0) ('S (let ((xy (Dup (vector-ref val 1)))) (cons (S (car xy)) (S (cdr xy))))) ('Z (cons Z Z))))
(define (Bus n bs) (case (vector-ref n 0) ('Z (Inc bs)) ('S (let ((xy (Dup (vector-ref n 1)))) (Bus (car xy) (Bus (cdr xy) bs))))))
(define O32        (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O (O E)))))))))))))))))))))))))))))))))
(define N26        (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S (S Z)))))))))))))))))))))))))))
(display (Bus N26 O32))

To my surprise, this actually performs quite well (Scheme is faster than Haskell here). But I wonder if this is the best way to do it? Is this reasonable, or is there some more "idiomatic" translation, which would be expected to perform better?

like image 417
MaiaVictor Avatar asked Apr 11 '21 16:04

MaiaVictor


People also ask

Why do we need algebraic data types in programming?

There’s two main kinds of algebraic data types: Sum types and Product types. Together, they’re like a dynamic duo for encoding business logic. They help us make good things possible, and bad things impossible. What next? I’ve tried to describe the ideas in functional programming that confused me most.

Are algebraic data types the same as algebraic structures in typescript?

TypeScript and Flow have some support for algebraic data types. But they’re both fairly limited. So, algebraic data types are not the same thing as algebraic structures. But they’re both useful. And it’s good to know the difference. It seems the smug functional programmer was right. Algebraic data types are handy for encoding business logic.

What are the algebraic data types in Python?

Arrays, Objects, Maps, WeakMaps, and Sets are all algebraic data types. You can put values of many types ‘inside’ them (so to speak).

Can you make algebraic data types with named parameters?

If you abuse C# named parameters (introduced in C# 4.0), you can make algebraic data types that are easy to match on:


1 Answers

Broadly, I think you have two approaches: a “positive” encoding like you’ve done here, where you have vectors denoting products containing (dynamic) tags to distinguish sums, and a “negative” encoding (Böhm–Berarducci / visitor) where you represent data types by the recursion scheme for consuming them.

An example of a BB-encoding version in Haskell:

{-# Language RankNTypes #-}

-- The type of /folds/ that consume bits.
newtype Bits = Bits
  { matchBits
    :: forall r.  -- Caller-specified result type.
       (r -> r)   -- O: 1 recursive field
    -> (r -> r)   -- I: 1 recursive field
    -> r          -- E: no fields; could be ‘() -> r’
    -> r
  }

-- The type of one /recursive unrolling/ of bits.
newtype Bits' = Bits'
  { matchBits'
    :: forall r.    -- Also any result type.
       (Bits -> r)  -- But note! ‘Bits’ instead of ‘r’.
    -> (Bits -> r)
    -> r
    -> r
  }

-- Basic constructors retain their types.
mkI, mkO :: Bits -> Bits
mkE :: Bits

mkI', mkO' :: Bits' -> Bits'
mkE' :: Bits'

-- Constructor functions perform visitor dispatch.
-- This is where the recursion happens in ‘matchBits’.
mkO pred = Bits $ \  o  i e -> o (matchBits pred o i e)
mkI pred = Bits $ \  o  i e -> i (matchBits pred o i e)
mkE      = Bits $ \ _o _i e -> e

-- General recursive dispatch is similar.
mkO' pred = Bits' $ \  o  i e -> o (matchBits' pred mkO mkI mkE)
mkI' pred = Bits' $ \  o  i e -> i (matchBits' pred mkO mkI mkE)
mkE'      = Bits' $ \ _o _i e -> e

-- Recursive deconstruction, used below.
recurBits :: Bits -> Bits'
recurBits bits = matchBits bits mkO' mkI' mkE'

-- We only need a fold for nats here.
newtype Nat = Nat
  { matchNat
    :: forall r.  -- Result type.
       (r -> r)   -- S: 1 recursive field
    -> r          -- Z: no fields; also could be ‘() -> r’
    -> r
  }

mkS :: Nat -> Nat
mkZ :: Nat

mkS pred = Nat $ \  s z -> s (matchNat pred s z)
mkZ      = Nat $ \ _s z -> z

-- Case branches with ‘matchBits’ receive the /result/
-- of the recursive call on a recursive field. So this
-- is /not/ what we want:
--
-- > inc bits = matchBits bits mkI (mkO . inc) mkE
--
-- Instead, we want the field itself, so we must use
-- the recursive ‘matchBits'’.
inc :: Bits -> Bits
inc bits = matchBits' (recurBits bits) mkI (mkO . inc) mkE

-- Or: ‘dup nat = matchNat nat (mkS *** mkS) (mkZ, mkZ)’
-- Or: ‘dup nat = (nat, nat)’ = ‘dup = join (,)’
dup :: Nat -> (Nat, Nat)
dup nat = matchNat nat
  (\ (x, y) -> (mkS x, mkS y))  -- S
  (mkZ, mkZ)                    -- Z

-- NB: think of as ‘Nat -> (Bits -> Bits)’.
bus :: Nat -> Bits -> Bits
bus n = matchNat n
  (\ f -> f . f)  -- S
  inc             -- Z

And you can translate that more or less directly into Scheme. Here’s an untested and probably incorrect sketch of a translation, just to illustrate a starting point:

(define (O pred)  (lambda (o i e) (o (pred o i e))))
(define (I pred)  (lambda (o i e) (i (pred o i e))))
(define E         (lambda (o i e) (e)))

(define (O_ pred) (lambda (o i e) (o (pred O I E))))
(define (I_ pred) (lambda (o i e) (i (pred O I E))))
(define E_        (lambda (o i e) (e)))

(define (S pred)  (lambda (s z) (s (pred s z))))
(define Z         (lambda (s z) (z)))

(define (recurBits bits) (bits O_ I_ E_))

(define (Inc bits)
  ((recurBits bits)
     I
     (lambda (pred) (O (Inc pred)))
     E))

(define (Dup val)
  (val (lambda (p)
         (let ((x (car p))
               (y (cdr p)))
           (cons (S x) (S y))))
       (cons Z Z)))

(define (Bus n bs)
  ((Bus_ n) bs))
(define (Bus_ n)
  (n (lambda (pred) (lambda (bs) (pred (pred bs))))
     inc))

You may need to add some explicit parameters or additional lambdas in a couple places to handle differences in partial application and lazy evaluation, like the awkwardness with Bus_.

In general, though, I’d expect this method to have comparable or better performance characteristics in many applications. Instead of vectors, it’s relying on closures, which can likely be compiled better because the language implementation knows more about their structure. Instead of dynamically dispatching on values, it selects among functions to (tail-)call, avoiding the construction of some values.

I also found Oleg Kiselyov’s notes on BB-encoding a helpful resource when learning about this technique in Haskell.

like image 156
Jon Purdy Avatar answered Nov 01 '22 07:11

Jon Purdy