<code>Set</code>, similarly to <code>[]</code> has a perfectly defined monadic operations. The problem is that they require that the values satisfy <code>Ord</code> constraint, and so it's impossible to define <code>return</code> and <code>>>=</code> without any constraints. The same problem applies to many other data structures that require some kind of constraints on possible values. The standard trick (suggested to me in a haskell-cafe post) is to wrap <code>Set</code> into the continuation monad. <code>ContT</code> doesn't care if the underlying type functor has any constraints. The constraints become only needed when wrapping/unwrapping <code>Set</code>s into/from continuations: <pre class="prettyprint"><code>import Control.Monad.Cont import Data.Foldable (foldrM) import Data.Set setReturn :: a -> Set a setReturn = singleton setBind :: (Ord b) => Set a -> (a -> Set b) -> Set b setBind set f = foldl' (\s -> union s . f) empty set type SetM r a = ContT r Set a fromSet :: (Ord r) => Set a -> SetM r a fromSet = ContT . setBind toSet :: SetM r r -> Set r toSet c = runContT c setReturn </code></pre> This works as needed. For example, we can simulate a non-deterministic function that either increases its argument by 1 or leaves it intact: <pre class="prettyprint"><code>step :: (Ord r) => Int -> SetM r Int step i = fromSet $ fromList [i, i + 1] -- repeated application of step: stepN :: Int -> Int -> Set Int stepN times start = toSet $ foldrM ($) start (replicate times step) </code></pre> Indeed, <code>stepN 5 0</code> yields <code>fromList [0,1,2,3,4,5]</code>. If we used <code>[]</code> monad instead, we would get <pre class="prettyprint"><code>[0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5] </code></pre> instead. <hr> The problem is efficiency. If we call <code>stepN 20 0</code> the output takes a few seconds and <code>stepN 30 0</code> doesn't finish within a reasonable amount of time. It turns out that all <code>Set.union</code> operations are performed at the end, instead of performing them after each monadic computation. The result is that exponentially many <code>Set</code>s are constructed and <code>union</code>ed only at the end, which is unacceptable for most tasks. Is there any way around it, to make this construction efficient? I tried but without success. (I even suspect that there could be some kinds of theoretical limits following from Curry-Howard isomorphism and Glivenko's theorem. Glivenko's theorem says that for any propositional tautology φ the formula ¬¬φ can be proved in intuitionistic logic. However, I suspect that the length of the proof (in normal form) can be exponentially long. So, perhaps, there could be cases when wrapping a computation into the continuation monad will make it exponentially longer?)

Monads are one particular way of structuring and sequencing computations. The bind of a monad cannot magically restructure your computation so as to happen in a more efficient way. There are two problems with the way you structure your computation. <ol> <li> When evaluating <code>stepN 20 0</code>, the result of <code>step 0</code> will be computed 20 times. This is because each step of the computation produces 0 as one alternative, which is then fed to the next step, which also produces 0 as alternative, and so on... Perhaps a bit of memoization here can help. </li> <li> A much bigger problem is the effect of <code>ContT</code> on the structure of your computation. With a bit of equational reasoning, expanding out the result of <code>replicate 20 step</code>, the definition of <code>foldrM</code> and simplifying as many times as necessary, we can see that <code>stepN 20 0</code> is equivalent to: <pre class="prettyprint"><code>(...(return 0 >>= step) >>= step) >>= step) >>= ...) </code></pre> All parentheses of this expression associate to the left. That's great, because it means that the RHS of each occurrence of <code>(>>=)</code> is an elementary computation, namely <code>step</code>, rather than a composed one. However, zooming in on the definition of <code>(>>=)</code> for <code>ContT</code>, <pre class="prettyprint"><code>m >>= k = ContT $ \c -> runContT m (\a -> runContT (k a) c) </code></pre> we see that when evaluating a chain of <code>(>>=)</code> associating to the left, each bind will push a new computation onto the current continuation <code>c</code>. To illustrate what is going on, we can use again a bit of equational reasoning, expanding out this definition for <code>(>>=)</code> and the definition for <code>runContT</code>, and simplifying, yielding: <pre class="prettyprint"><code>setReturn 0 `setBind` (\x1 -> step x1 `setBind` (\x2 -> step x2 `setBind` (\x3 -> ...)...) </code></pre> Now, for each occurrence of <code>setBind</code>, let's ask ourselves what the RHS argument is. For the leftmost occurrence, the RHS argument is the whole rest of the computation after <code>setReturn 0</code>. For the second occurrence, it's everything after <code>step x1</code>, etc. Let's zoom in to the definition of <code>setBind</code>: <pre class="prettyprint"><code>setBind set f = foldl' (\s -> union s . f) empty set </code></pre> Here <code>f</code> represents all the rest of the computation, everything on the right hand side of an occurrence of <code>setBind</code>. That means that at each step, we are capturing the rest of the computation as <code>f</code>, and applying <code>f</code> as many times as there are elements in <code>set</code>. The computations are not elementary as before, but rather composed, and these computations will be duplicated many times. </li> </ol> The crux of the problem is that the <code>ContT</code> monad transformer is transforming the initial structure of the computation, which you meant as a left associative chain of <code>setBind</code>'s, into a computation with a different structure, ie a right associative chain. This is after all perfectly fine, because one of the monad laws says that, for every <code>m</code>, <code>f</code> and <code>g</code> we have <pre class="prettyprint"><code>(m >>= f) >>= g = m >>= (\x -> f x >>= g) </code></pre> However, the monad laws do not impose that the complexity remain the same on each side of the equations of each law. And indeed, in this case, the left associative way of structuring this computation is a lot more efficient. The left associative chain of <code>setBind</code>'s evaluates in no time, because only elementary subcomputations are duplicated. It turns out that other solutions shoehorning <code>Set</code> into a monad also suffer from the same problem. In particular, the set-monad package, yields similar runtimes. The reason being, that it too, rewrites left associative expressions into right associative ones. I think you have put the finger on a very important yet rather subtle problem with insisting that <code>Set</code> obeys a <code>Monad</code> interface. And I don't think it can be solved. The problem is that the type of the bind of a monad needs to be <pre class="prettyprint"><code>(>>=) :: m a -> (a -> m b) -> m b </code></pre> ie no class constraint allowed on either <code>a</code> or <code>b</code>. That means that we cannot nest binds on the left, without first invoking the monad laws to rewrite into a right associative chain. Here's why: given <code>(m >>= f) >>= g</code>, the type of the computation <code>(m >>= f)</code> is of the form <code>m b</code>. A value of the computation <code>(m >>= f)</code> is of type <code>b</code>. But because we can't hang any class constraint onto the type variable <code>b</code>, we can't know that the value we got satisfies an <code>Ord</code> constraint, and therefore cannot use this value as the element of a set on which we want to be able to compute <code>union</code>'s.

I found out another possibility, based on GHC's ConstraintKinds extension. The idea is to redefine <code>Monad</code> so that it includes a parametric constraint on allowed values: <pre class="prettyprint"><code>{-# LANGUAGE ConstraintKinds #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE RebindableSyntax #-} import qualified Data.Foldable as F import qualified Data.Set as S import Prelude hiding (Monad(..), Functor(..)) class CFunctor m where -- Each instance defines a constraint it valust must satisfy: type Constraint m a -- The default is no constraints. type Constraint m a = () fmap :: (Constraint m a, Constraint m b) => (a -> b) -> (m a -> m b) class CFunctor m => CMonad (m :: * -> *) where return :: (Constraint m a) => a -> m a (>>=) :: (Constraint m a, Constraint m b) => m a -> (a -> m b) -> m b fail :: String -> m a fail = error -- [] instance instance CFunctor [] where fmap = map instance CMonad [] where return = (: []) (>>=) = flip concatMap -- Set instance instance CFunctor S.Set where -- Sets need Ord. type Constraint S.Set a = Ord a fmap = S.map instance CMonad S.Set where return = S.singleton (>>=) = flip F.foldMap -- Example: -- prints fromList [3,4,5] main = print $ do x <- S.fromList [1,2] y <- S.fromList [2,3] return $ x + y </code></pre> (The problem with this approach is in the case the monadic values are functions, such as <code>m (a -> b)</code>, because they can't satisfy constraints like <code>Ord (a -> b)</code>. So one can't use combinators like <code><*></code> (or <code>ap</code>) for this constrained <code>Set</code> monad.)

Constructing efficient monad instances on `Set` (and other containers with constraints) using the continuation monad

Tags:

complexity-theory

haskell

monads

continuations

curry-howard

Set, similarly to [] has a perfectly defined monadic operations. The problem is that they require that the values satisfy Ord constraint, and so it's impossible to define return and >>= without any constraints. The same problem applies to many other data structures that require some kind of constraints on possible values.

The standard trick (suggested to me in a haskell-cafe post) is to wrap Set into the continuation monad. ContT doesn't care if the underlying type functor has any constraints. The constraints become only needed when wrapping/unwrapping Sets into/from continuations:

import Control.Monad.Cont
import Data.Foldable (foldrM)
import Data.Set

setReturn :: a -> Set a
setReturn = singleton

setBind :: (Ord b) => Set a -> (a -> Set b) -> Set b
setBind set f = foldl' (\s -> union s . f) empty set

type SetM r a = ContT r Set a

fromSet :: (Ord r) => Set a -> SetM r a
fromSet = ContT . setBind

toSet :: SetM r r -> Set r
toSet c = runContT c setReturn

This works as needed. For example, we can simulate a non-deterministic function that either increases its argument by 1 or leaves it intact:

step :: (Ord r) => Int -> SetM r Int
step i = fromSet $ fromList [i, i + 1]

-- repeated application of step:
stepN :: Int -> Int -> Set Int
stepN times start = toSet $ foldrM ($) start (replicate times step)

Indeed, stepN 5 0 yields fromList [0,1,2,3,4,5]. If we used [] monad instead, we would get

[0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5]

instead.

The problem is efficiency. If we call stepN 20 0 the output takes a few seconds and stepN 30 0 doesn't finish within a reasonable amount of time. It turns out that all Set.union operations are performed at the end, instead of performing them after each monadic computation. The result is that exponentially many Sets are constructed and unioned only at the end, which is unacceptable for most tasks.

Is there any way around it, to make this construction efficient? I tried but without success.

(I even suspect that there could be some kinds of theoretical limits following from Curry-Howard isomorphism and Glivenko's theorem. Glivenko's theorem says that for any propositional tautology φ the formula ¬¬φ can be proved in intuitionistic logic. However, I suspect that the length of the proof (in normal form) can be exponentially long. So, perhaps, there could be cases when wrapping a computation into the continuation monad will make it exponentially longer?)

679

asked Aug 29 '12 17:08

Petr

4 Answers

Monads are one particular way of structuring and sequencing computations. The bind of a monad cannot magically restructure your computation so as to happen in a more efficient way. There are two problems with the way you structure your computation.

When evaluating stepN 20 0, the result of step 0 will be computed 20 times. This is because each step of the computation produces 0 as one alternative, which is then fed to the next step, which also produces 0 as alternative, and so on...

Perhaps a bit of memoization here can help.
A much bigger problem is the effect of ContT on the structure of your computation. With a bit of equational reasoning, expanding out the result of replicate 20 step, the definition of foldrM and simplifying as many times as necessary, we can see that stepN 20 0 is equivalent to:
```
(...(return 0 >>= step) >>= step) >>= step) >>= ...)
```
All parentheses of this expression associate to the left. That's great, because it means that the RHS of each occurrence of (>>=) is an elementary computation, namely step, rather than a composed one. However, zooming in on the definition of (>>=) for ContT,
```
m >>= k = ContT $ \c -> runContT m (\a -> runContT (k a) c)
```
we see that when evaluating a chain of (>>=) associating to the left, each bind will push a new computation onto the current continuation c. To illustrate what is going on, we can use again a bit of equational reasoning, expanding out this definition for (>>=) and the definition for runContT, and simplifying, yielding:
```
setReturn 0 `setBind`
    (\x1 -> step x1 `setBind`
        (\x2 -> step x2 `setBind` (\x3 -> ...)...)
```
Now, for each occurrence of setBind, let's ask ourselves what the RHS argument is. For the leftmost occurrence, the RHS argument is the whole rest of the computation after setReturn 0. For the second occurrence, it's everything after step x1, etc. Let's zoom in to the definition of setBind:
```
setBind set f = foldl' (\s -> union s . f) empty set
```
Here f represents all the rest of the computation, everything on the right hand side of an occurrence of setBind. That means that at each step, we are capturing the rest of the computation as f, and applying f as many times as there are elements in set. The computations are not elementary as before, but rather composed, and these computations will be duplicated many times.

The crux of the problem is that the ContT monad transformer is transforming the initial structure of the computation, which you meant as a left associative chain of setBind's, into a computation with a different structure, ie a right associative chain. This is after all perfectly fine, because one of the monad laws says that, for every m, f and g we have

(m >>= f) >>= g = m >>= (\x -> f x >>= g)

However, the monad laws do not impose that the complexity remain the same on each side of the equations of each law. And indeed, in this case, the left associative way of structuring this computation is a lot more efficient. The left associative chain of setBind's evaluates in no time, because only elementary subcomputations are duplicated.

It turns out that other solutions shoehorning Set into a monad also suffer from the same problem. In particular, the set-monad package, yields similar runtimes. The reason being, that it too, rewrites left associative expressions into right associative ones.

I think you have put the finger on a very important yet rather subtle problem with insisting that Set obeys a Monad interface. And I don't think it can be solved. The problem is that the type of the bind of a monad needs to be

(>>=) :: m a -> (a -> m b) -> m b

ie no class constraint allowed on either a or b. That means that we cannot nest binds on the left, without first invoking the monad laws to rewrite into a right associative chain. Here's why: given (m >>= f) >>= g, the type of the computation (m >>= f) is of the form m b. A value of the computation (m >>= f) is of type b. But because we can't hang any class constraint onto the type variable b, we can't know that the value we got satisfies an Ord constraint, and therefore cannot use this value as the element of a set on which we want to be able to compute union's.

185

answered Sep 28 '22 04:09

macron

Recently on Haskell Cafe Oleg gave an example how to implement the Set monad efficiently. Quoting:

... And yet, the efficient genuine Set monad is possible.

... Enclosed is the efficient genuine Set monad. I wrote it in direct style (it seems to be faster, anyway). The key is to use the optimized choose function when we can.

  {-# LANGUAGE GADTs, TypeSynonymInstances, FlexibleInstances #-}

  module SetMonadOpt where

  import qualified Data.Set as S
  import Control.Monad

  data SetMonad a where
      SMOrd :: Ord a => S.Set a -> SetMonad a
      SMAny :: [a] -> SetMonad a

  instance Monad SetMonad where
      return x = SMAny [x]

      m >>= f = collect . map f $ toList m

  toList :: SetMonad a -> [a]
  toList (SMOrd x) = S.toList x
  toList (SMAny x) = x

  collect :: [SetMonad a] -> SetMonad a
  collect []  = SMAny []
  collect [x] = x
  collect ((SMOrd x):t) = case collect t of
                           SMOrd y -> SMOrd (S.union x y)
                           SMAny y -> SMOrd (S.union x (S.fromList y))
  collect ((SMAny x):t) = case collect t of
                           SMOrd y -> SMOrd (S.union y (S.fromList x))
                           SMAny y -> SMAny (x ++ y)

  runSet :: Ord a => SetMonad a -> S.Set a
  runSet (SMOrd x) = x
  runSet (SMAny x) = S.fromList x

  instance MonadPlus SetMonad where
      mzero = SMAny []
      mplus (SMAny x) (SMAny y) = SMAny (x ++ y)
      mplus (SMAny x) (SMOrd y) = SMOrd (S.union y (S.fromList x))
      mplus (SMOrd x) (SMAny y) = SMOrd (S.union x (S.fromList y))
      mplus (SMOrd x) (SMOrd y) = SMOrd (S.union x y)

  choose :: MonadPlus m => [a] -> m a
  choose = msum . map return


  test1 = runSet (do
    n1 <- choose [1..5]
    n2 <- choose [1..5]
    let n = n1 + n2
    guard $ n < 7
    return n)
  -- fromList [2,3,4,5,6]

  -- Values to choose from might be higher-order or actions
  test1' = runSet (do
    n1 <- choose . map return $ [1..5]
    n2 <- choose . map return $ [1..5]
    n  <- liftM2 (+) n1 n2
    guard $ n < 7
    return n)
  -- fromList [2,3,4,5,6]

  test2 = runSet (do
    i <- choose [1..10]
    j <- choose [1..10]
    k <- choose [1..10]
    guard $ i*i + j*j == k * k
    return (i,j,k))
  -- fromList [(3,4,5),(4,3,5),(6,8,10),(8,6,10)]

  test3 = runSet (do
    i <- choose [1..10]
    j <- choose [1..10]
    k <- choose [1..10]
    guard $ i*i + j*j == k * k
    return k)
  -- fromList [5,10]

  -- Test by Petr Pudlak

  -- First, general, unoptimal case
  step :: (MonadPlus m) => Int -> m Int
  step i = choose [i, i + 1]

  -- repeated application of step on 0:
  stepN :: Int -> S.Set Int
  stepN = runSet . f
    where
    f 0 = return 0
    f n = f (n-1) >>= step

  -- it works, but clearly exponential
  {-
  *SetMonad> stepN 14
  fromList [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14]
  (0.09 secs, 31465384 bytes)
  *SetMonad> stepN 15
  fromList [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
  (0.18 secs, 62421208 bytes)
  *SetMonad> stepN 16
  fromList [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
  (0.35 secs, 124876704 bytes)
  -}

  -- And now the optimization
  chooseOrd :: Ord a => [a] -> SetMonad a
  chooseOrd x = SMOrd (S.fromList x)

  stepOpt :: Int -> SetMonad Int
  stepOpt i = chooseOrd [i, i + 1]

  -- repeated application of step on 0:
  stepNOpt :: Int -> S.Set Int
  stepNOpt = runSet . f
    where
    f 0 = return 0
    f n = f (n-1) >>= stepOpt

  {-
  stepNOpt 14
  fromList [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14]
  (0.00 secs, 515792 bytes)
  stepNOpt 15
  fromList [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
  (0.00 secs, 515680 bytes)
  stepNOpt 16
  fromList [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
  (0.00 secs, 515656 bytes)

  stepNOpt 30
  fromList [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30]
  (0.00 secs, 1068856 bytes)
  -}

answered Sep 28 '22 03:09

Petr

I don't think your performance problems in this case are due to the use of Cont

step' :: Int -> Set Int
step' i = fromList [i,i + 1]

foldrM' f z0 xs = Prelude.foldl f' setReturn xs z0
  where f' k x z = f x z `setBind` k

stepN' :: Int -> Int -> Set Int
stepN' times start = foldrM' ($) start (replicate times step')

gets similar performance to the Cont based implementation but occurs entirely in the Set "restricted monad"

I am not sure if I believe your claim about Glivenko's theorem leading to exponential increase in (normalized) proof size--at least in the Call-By-Need context. That is because we can arbitrarily reuse subproofs (and our logic is second order, we need only a single proof of forall a. ~~(a \/ ~a)). Proofs are not trees, they are graphs (sharing).

In general, you are likely to see performance costs from Cont wrapping Set but they can usually be avoided via

smash :: (Ord r, Ord k) => SetM r r -> SetM k r
smash = fromSet . toSet

answered Sep 28 '22 04:09

Philip JF

I found out another possibility, based on GHC's ConstraintKinds extension. The idea is to redefine Monad so that it includes a parametric constraint on allowed values:

{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE RebindableSyntax #-}

import qualified Data.Foldable as F
import qualified Data.Set as S
import Prelude hiding (Monad(..), Functor(..))

class CFunctor m where
    -- Each instance defines a constraint it valust must satisfy:
    type Constraint m a
    -- The default is no constraints.
    type Constraint m a = ()
    fmap   :: (Constraint m a, Constraint m b) => (a -> b) -> (m a -> m b)
class CFunctor m => CMonad (m :: * -> *) where
    return :: (Constraint m a) => a -> m a
    (>>=)  :: (Constraint m a, Constraint m b) => m a -> (a -> m b) -> m b
    fail   :: String -> m a
    fail   = error

-- [] instance
instance CFunctor [] where
    fmap = map
instance CMonad [] where
    return  = (: [])
    (>>=)   = flip concatMap

-- Set instance
instance CFunctor S.Set where
    -- Sets need Ord.
    type Constraint S.Set a = Ord a
    fmap = S.map
instance CMonad S.Set where
    return  = S.singleton
    (>>=)   = flip F.foldMap

-- Example:

-- prints fromList [3,4,5]
main = print $ do
    x <- S.fromList [1,2]
    y <- S.fromList [2,3]
    return $ x + y

(The problem with this approach is in the case the monadic values are functions, such as m (a -> b), because they can't satisfy constraints like Ord (a -> b). So one can't use combinators like <*> (or ap) for this constrained Set monad.)

answered Sep 28 '22 03:09

Petr

Related questions
                            
                                const function in Haskell
                            
                                In Haskell, is there infinity :: Num a => a?
                            
                                In Haskell, where does the range ['a'..] stop?
                            
                                Is it possible to program and check invariants in Haskell?
                            
                                Is Haskell a strongly typed programming language?
                            
                                Functional dependencies in Haskell
                            
                                Grouping a list into lists of n elements in Haskell
                            
                                How to get nth element from a 10-tuple in Haskell?
                            
                                How to debug type-level programs
                            
                                Have "Brodal search trees" really been implemented for practical use?
                            
                                Can you define `Comonads` based on `Monads`?
                            
                                LaTeX natural deduction proofs using Haskell
                            
                                What are Haskell's monad transformers in categorical terms?
                            
                                How to uninstall a Haskell package installed with stack?
                            
                                How much does it cost for Haskell FFI to go into C and back?
                            
                                Haskell library like SymPy? [closed]
                            
                                How can I disable Haskell warning in small block?
                            
                                How does ArrowLoop work? Also, mfix?
                            
                                What IO activity does the GHC IO manager support?
                            
                                Examples where compiler-optimized functional code performs better than imperative code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Constructing efficient monad instances on `Set` (and other containers with constraints) using the continuation monad

Tags:

complexity-theory

haskell

monads

continuations

curry-howard

Petr

People also ask

4 Answers

macron

Petr

Philip JF

Petr

Recent Activity

Donate For Us