Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell: Common Corecursive Fallacies

So I was thinking about a graph distance algorithm tonight, and came up with this while I was driving in the car:

module GraphDistance where
import Data.Map

distance :: (Ord a) => a -> Map a [a] -> Map a Int
distance a m = m' 
  where m' = mapWithKey d m
        d a' as = if a == a' then 0 else 1 + minimum (Prelude.map (m'!) as)

At first, I was fairly proud of myself, since it seemed so elegant. But then I realized it wouldn't work - the corecursion might get stuck in a loop.

I had to code it up to convince myself:

ghci> distance 1 $ fromList [(0,[1]),(1,[0,2]),(2,[1,3]),(3,[2])]
fromList [(0,1),(1,0),(2,^CInterrupted.

But now I think I've pretty much thought it through.

Is there a list of common errors and anti-patterns when dealing with corecursive data that I can go read up on, so I can train my brain to think corecursively? Experience has trained me fairly well for thinking through non-corecursive data, but I'd like to learn from others mistakes if I can.

like image 784
rampion Avatar asked Jun 08 '11 03:06

rampion


1 Answers

Well, there's really only one fundamental mistake when dealing with corecursive data, and that's carelessly using recursion on it!

Corecursion implies that data is being generated incrementally in some sense. Your graph distance function is instructive here, because it seems like it should work, so think about where the incremental part should be: The starting point is a distance of 0 from a node to itself, otherwise one greater than the minimum distance among its own immediate neighbors. Thus, we would expect each distance value to be incremental, which means we need them to be suitably corecursive themselves.

The recursion at issue, then, is occurring due to the combination of (+) and minimum: when finding the minimum, 1 will always be less than 1 + n, without needing to worry about what n is... but there's no way to compare Ints without computing the entire value.

In short, the algorithm expects to be able to compare how many times (1 +) was applied to 0 only as far as needed; that is to say, it wants lazy natural numbers defined using "zero" and "successor".

Behold:

data Nat = Z | S Nat

natToInt :: Nat -> Int
natToInt Z = 0
natToInt (S n) = 1 + natToInt n

instance Show Nat where show = show . natToInt

instance Eq Nat where
    Z == Z = True
    (S n1) == (S n2) = n1 == n2
    _ == _ = False

    Z /= Z = False
    (S n1) /= (S n2) = n1 /= n2
    _ /= _ = True


instance Ord Nat where
    compare Z Z = EQ
    compare Z (S _) = LT
    compare (S _) Z = GT
    compare (S n1) (S n2) = compare n1 n2

And then in GHCi:

> distance 1 $ fromList [(0,[1]),(1,[0,2]),(2,[1,3]),(3,[2])]
fromList [(0,1),(1,0),(2,1),(3,2)]

Proof that your algorithm works[0]; your implementation of it was just incorrect.


Now, as a slight variation, let's apply your algorithm to a different graph:

> distance 1 $ fromList [(0,[1]),(1,[0]),(2,[3]),(3,[2])]

...what do we expect this to do? What is the distance from node 1 for nodes 2 or 3?

Running it in GHCi has the obvious result:

fromList [(0,1),(1,0),(2,^CInterrupted.

Nevertheless, the algorithm works correctly on this graph. Can you see the problem? Why did it hang in GHCi?


In summary, you need to clearly distinguish between two forms that can't be mixed freely:

  • Lazy, possibly infinite data, generated corecursively
  • Finite data, consumed recursively

Both forms can be transformed in a structure-agnostic way (e.g., map works on finite and infinite lists both). Codata can be consumed incrementally, driven by a corecursive algorithm; data can be generated recursively, limited by a recursive algorithm.

What you can't do is consume codata recursively (e.g., left folding an infinite list) or generate data corecursively (rare in Haskell, due to laziness).

[0]: Actually, it will fail on some inputs (e.g., some disconnected graphs), but that's a different issue.

like image 63
C. A. McCann Avatar answered Nov 02 '22 23:11

C. A. McCann