Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any nice tools for untying knots in Haskell?

I've a data structure with several different types of internal circular linking, making it infinite in the sense of say the cycle command. Are there any interesting modules for collapsing such structures into flat data structures that use indexing instead?

I'm interested in serializing the complete data structure, both via Read and Show as well as via Data.Serialize or similar.

There are obviously nice features of building a sequential index, but an index based upon hash values of memory addresses might work fine too.

like image 876
Jeff Burdges Avatar asked Jan 20 '12 17:01

Jeff Burdges


3 Answers

This isn't really possible; there is no way to detect, from within pure code, that the list repeat 1 is cyclic. Indeed, it could even not be cyclic, on sufficiently esoteric implementations of Haskell.

It is technically possible on common implementations of Haskell, however, using a technique called observable sharing,1 but it's rather complicated, and unless you're a very experienced Haskell programmer, most of the time you don't really want to solve your problem with observable sharing.

Unless you have very special requirements that would make this a good idea, I would suggest instead representing the cyclic graph that your structure represents directly; for instance, using a graph library such as the standard Data.Graph or FGL.

If you do want to try observable sharing, though, I would suggest using the data-reify package (as described in Type-Safe Observable Sharing in Haskell), which only requires a simple type-class instance and has a safe, IO-based interface; it's essentially based on memory addresses (actually, StableNames), as you suggested.

One of the reasons you shouldn't use observable sharing unless you really have a need to is that it breaks referential transparency; for instance, it allows you to distinguish repeat 1 and 1 : repeat 1, and even let xs = 1 : xs in xs and let f x = x : f x in f 1, of these depending on compiler optimisations!

1 The basic idea behind observable sharing is to expose the sharing done by standard Haskell implementations, which can be basically summarised as "duplicating values doesn't copy them"; so let xs = 1:xs in xs is a cyclic list, because the same value for the entirety of xs is used for its tail, rather than recomputing it each time, and (\x -> (x,x)) expensive (where expensive is some long-running computation that produces a large result) just results in two pointers to expensive, rather than copying the thunk (which means that, even if you force both fields, the computation will only be done once, and the result will only be stored once in memory).

like image 59
ehird Avatar answered Nov 05 '22 12:11

ehird


As mentioned in @Paul Johnson's answer, you can build a labelled version of your structure and use the labels as a means of referencing different parts of your structure.

But when you need it you can eliminate the labels leaving an optimised structure with all of its knots tied. You keep the original version for serialisation, but use the optimised version when your algorithms need it.

Here is some code:

import Debug.Trace
import Data.Map

data Tree a name = Leaf a | Fork (Tree a name) (Tree a name) | Name name deriving Show

instance Monad (Tree a) where
    return x = Name x
    Leaf a >>= f = Leaf a
    Fork l r >>= f = Fork (l >>= f) (r >>= f)
    Name a >>= f = f a

tie :: Map String (Tree Int String) -> Map String (Tree Int ())
tie tree = let result = fmap (>>= find) tree
               find name = trace ("Looking up " ++ name) $ flip (findWithDefault (Leaf 0)) result name
           in  result

treerefs :: Map String (Tree Int String)
treerefs = ("root"  `insert` Fork (Name "left") (Name "right")) $
           ("left"  `insert` Fork (Leaf 1)      (Name "root"))  $
           ("right" `insert` Fork (Name "root") (Leaf 2))       $
           empty

trees = tie treerefs
root = findWithDefault (Leaf 0) "root" trees

p (Leaf a) = "Leaf " ++ show a
p (Fork _ _) = "Fork"
p (Name n) = "Name " ++ show n

l (Fork a _) = a
r (Fork _ b) = b

test = p (r (r (r (l (r (l (l (r (l (r (r (l root))))))))))))

Notice how you can easily serialise treerefs but you can traverse root fast. I put the trace call in so you can see how often name lookup takes place. It really does close the loop (in GHC at least).

A tree with cyclic references

More generically, instead of Tree you can use any free monad generated by a functor and you can use the method here to speed up the knot tying. (My example above doesn't fully make use of the Monad instance but I wanted to make the connection with that paper.) Compare also with loeb.

like image 35
sigfpe Avatar answered Nov 05 '22 13:11

sigfpe


If your data elements have unique identifiers (some papers use the term "names") then you can build a table of those identifiers and the ways that they relate to other identifiers. In OO implementations (and in the "observable sharing" described by ehird) this is done using the memory address of the object as an ID. Functional programming only has values, not objects with memory addresses, so you have to add the unique ID yourself when you build the structure. Then you can detect cycles by testing whether an ID belongs to the set of those already seen.

like image 33
Paul Johnson Avatar answered Nov 05 '22 13:11

Paul Johnson