Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't Haskell's Data.Set support infinite sets?

Tags:

haskell

set

In the following snippet:

import qualified Data.Set as Set

data Nat = Zero | Succ Nat deriving (Eq, Show, Ord)

instance Enum Nat where
  pred (Succ x)     = x
  succ x            = Succ x
  toEnum 0          = Zero
  toEnum x          = Succ (toEnum (x-1))
  fromEnum Zero     = 0
  fromEnum (Succ x) = 1 + (fromEnum x)

nats :: [Nat]
nats = [Zero ..]

natSet :: Set.Set Nat
natSet = Set.fromList nats

Why does:

  • elem (toEnum 100) nats == True

but

  • Set.member (toEnum 100) natSet never ends?
like image 706
Zinedine Avatar asked Aug 19 '11 15:08

Zinedine


4 Answers

The existing answers are sufficient, but I want to expound a little bit on the behavior of Sets.

Looks like you are hoping for a lazy set of all Nats; you take the infinite list of all Nats and use Set.toList on it. That would be nice; mathematicians often talk in terms of the "set of all natural numbers". The problem is the implementation of Set is not as accommodating of laziness as lists are.

The implementation of Set is based on size balanced binary trees (or trees of bounded balance)

The docs for Data.Set

Suppose you wish to lazily construct a binary tree from a list. Elements from the list would only be inserted into the tree when a deeper traversal of the tree was necessary. So then you ask if 100 is in the tree. It would go along adding numbers 1-99 to the tree, one at a time. Then it would finally add 100 to the tree, and discover that 100 is indeed an element in the tree. But notice what we did. We just performed an in-order traversal of the lazy list! So the first time, our imaginary LazyTree.contains would have roughly the same complexity as List.find (assuming an ammortized O(1) insert, which is a bad assumption for a simple binary tree, which would have O(log n) complexity). And without balancing, our tree would be very lopsided (we added the numbers 1 through 100 in order, so it would just be a big linked list down the right child of each branch). But with tree balancing during the traversal, it would be hard to know where to begin the traversal again; at least it certainly wouldn't be immediately intuitive.

tl;dr: nobody (afaik) has made a good lazy Set yet. So infinite Sets are more easily represented as infinite lists, for now.

like image 162
Dan Burton Avatar answered Nov 15 '22 14:11

Dan Burton


Set.fromList is not lazy, so it will not end if you pass it an infinite list. But natSet is not constructed until it is needed, so you only notice it when you run Set.member on it.

For example, even Set.null $ Set.fromList [0..] does not terminate.

like image 32
Sjoerd Visscher Avatar answered Nov 15 '22 15:11

Sjoerd Visscher


You can't have infinite sets. This doesn't just affect Set.member, whenever you do anything which will cause natSet to be evaluated even one step (even Set.null), it will go into an infinite loop.

like image 37
sepp2k Avatar answered Nov 15 '22 15:11

sepp2k


Let's see what happens when we adapt GHC's Set code to accommodate infinite sets:

module InfSet where

data InfSet a = Bin a (InfSet a) (InfSet a) 

-- create an infinite set by unfolding a value
ofUnfold :: (x -> (x, a, x)) -> x -> InfSet a
ofUnfold f x = 
  let (lx,a,rx) = f x
      l = ofUnfold f lx
      r = ofUnfold f rx
  in Bin a l r

-- check for membership in the infinite set
member :: Ord a => a -> InfSet a -> Bool
member x (Bin y l r) = case compare x y of
                         LT -> member x l
                         GT -> member x r
                         EQ -> True       

-- construct an infinite set representing a range of numbers
range :: Fractional a => (a, a) -> InfSet a
range = ofUnfold $ \(lo,hi) -> 
          let mid = (hi+lo) / 2
          in ( (lo, mid), mid, (mid, hi) )

Note how, instead of constructing the infinite set from an infinite list, I instead define a function ofUnfold to unfold a single value into an infinite list. It allows us to construct both branches lazily in parallel (we don't need to finish one branch before constructing another).

Let's give it a whirl:

ghci> :l InfSet
[1 of 1] Compiling InfSet           ( InfSet.hs, interpreted )
Ok, modules loaded: InfSet.
ghci> let r = range (0,128)
ghci> member 64 r
True
ghci> member 63 r
True
ghci> member 62 r
True
ghci> member (1/2) r
True
ghci> member (3/4) r
True

Well, that seems to work. What if we try a value outside of the Set?

ghci> member 129 r
^CInterrupted.

That will just run and run and never quit. There's no stopping branches in the inifinite set, so the search never quits. We could check the original range somehow, but that's not practical for infinite sets of discrete elements:

ghci> let ex = ofUnfold (\f -> ( f . (LT:), f [EQ], f . (GT:) )) id
ghci> :t ex
ex :: InfSet [Ordering]
ghci> member [EQ] ex
True
ghci> member [LT,EQ] ex
True
ghci> member [EQ,LT] ex
^CInterrupted.

So infinite sets are possible but I'm not sure they're useful.

like image 39
rampion Avatar answered Nov 15 '22 14:11

rampion