So I was thinking about a graph distance algorithm tonight, and came up with this while I was driving in the car:
module GraphDistance where
import Data.Map
distance :: (Ord a) => a -> Map a [a] -> Map a Int
distance a m = m'
where m' = mapWithKey d m
d a' as = if a == a' then 0 else 1 + minimum (Prelude.map (m'!) as)
At first, I was fairly proud of myself, since it seemed so elegant. But then I realized it wouldn't work - the corecursion might get stuck in a loop.
I had to code it up to convince myself:
ghci> distance 1 $ fromList [(0,[1]),(1,[0,2]),(2,[1,3]),(3,[2])]
fromList [(0,1),(1,0),(2,^CInterrupted.
But now I think I've pretty much thought it through.
Is there a list of common errors and anti-patterns when dealing with corecursive data that I can go read up on, so I can train my brain to think corecursively? Experience has trained me fairly well for thinking through non-corecursive data, but I'd like to learn from others mistakes if I can.
Well, there's really only one fundamental mistake when dealing with corecursive data, and that's carelessly using recursion on it!
Corecursion implies that data is being generated incrementally in some sense. Your graph distance function is instructive here, because it seems like it should work, so think about where the incremental part should be: The starting point is a distance of 0 from a node to itself, otherwise one greater than the minimum distance among its own immediate neighbors. Thus, we would expect each distance value to be incremental, which means we need them to be suitably corecursive themselves.
The recursion at issue, then, is occurring due to the combination of (+)
and minimum
: when finding the minimum, 1
will always be less than 1 + n
, without needing to worry about what n
is... but there's no way to compare Int
s without computing the entire value.
In short, the algorithm expects to be able to compare how many times (1 +)
was applied to 0
only as far as needed; that is to say, it wants lazy natural numbers defined using "zero" and "successor".
Behold:
data Nat = Z | S Nat
natToInt :: Nat -> Int
natToInt Z = 0
natToInt (S n) = 1 + natToInt n
instance Show Nat where show = show . natToInt
instance Eq Nat where
Z == Z = True
(S n1) == (S n2) = n1 == n2
_ == _ = False
Z /= Z = False
(S n1) /= (S n2) = n1 /= n2
_ /= _ = True
instance Ord Nat where
compare Z Z = EQ
compare Z (S _) = LT
compare (S _) Z = GT
compare (S n1) (S n2) = compare n1 n2
And then in GHCi:
> distance 1 $ fromList [(0,[1]),(1,[0,2]),(2,[1,3]),(3,[2])]
fromList [(0,1),(1,0),(2,1),(3,2)]
Proof that your algorithm works[0]; your implementation of it was just incorrect.
Now, as a slight variation, let's apply your algorithm to a different graph:
> distance 1 $ fromList [(0,[1]),(1,[0]),(2,[3]),(3,[2])]
...what do we expect this to do? What is the distance from node 1 for nodes 2 or 3?
Running it in GHCi has the obvious result:
fromList [(0,1),(1,0),(2,^CInterrupted.
Nevertheless, the algorithm works correctly on this graph. Can you see the problem? Why did it hang in GHCi?
In summary, you need to clearly distinguish between two forms that can't be mixed freely:
Both forms can be transformed in a structure-agnostic way (e.g., map
works on finite and infinite lists both). Codata can be consumed incrementally, driven by a corecursive algorithm; data can be generated recursively, limited by a recursive algorithm.
What you can't do is consume codata recursively (e.g., left folding an infinite list) or generate data corecursively (rare in Haskell, due to laziness).
[0]: Actually, it will fail on some inputs (e.g., some disconnected graphs), but that's a different issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With