I've a data structure with several different types of internal circular linking, making it infinite in the sense of say the cycle
command. Are there any interesting modules for collapsing such structures into flat data structures that use indexing instead?
I'm interested in serializing the complete data structure, both via Read
and Show
as well as via Data.Serialize
or similar.
There are obviously nice features of building a sequential index, but an index based upon hash values of memory addresses might work fine too.
This isn't really possible; there is no way to detect, from within pure code, that the list repeat 1
is cyclic. Indeed, it could even not be cyclic, on sufficiently esoteric implementations of Haskell.
It is technically possible on common implementations of Haskell, however, using a technique called observable sharing,1 but it's rather complicated, and unless you're a very experienced Haskell programmer, most of the time you don't really want to solve your problem with observable sharing.
Unless you have very special requirements that would make this a good idea, I would suggest instead representing the cyclic graph that your structure represents directly; for instance, using a graph library such as the standard Data.Graph or FGL.
If you do want to try observable sharing, though, I would suggest using the data-reify package (as described in Type-Safe Observable Sharing in Haskell), which only requires a simple type-class instance and has a safe, IO
-based interface; it's essentially based on memory addresses (actually, StableName
s), as you suggested.
One of the reasons you shouldn't use observable sharing unless you really have a need to is that it breaks referential transparency; for instance, it allows you to distinguish repeat 1
and 1 : repeat 1
, and even let xs = 1 : xs in xs
and let f x = x : f x in f 1
, of these depending on compiler optimisations!
1 The basic idea behind observable sharing is to expose the sharing done by standard Haskell implementations, which can be basically summarised as "duplicating values doesn't copy them"; so let xs = 1:xs in xs
is a cyclic list, because the same value for the entirety of xs
is used for its tail, rather than recomputing it each time, and (\x -> (x,x)) expensive
(where expensive
is some long-running computation that produces a large result) just results in two pointers to expensive
, rather than copying the thunk (which means that, even if you force both fields, the computation will only be done once, and the result will only be stored once in memory).
As mentioned in @Paul Johnson's answer, you can build a labelled version of your structure and use the labels as a means of referencing different parts of your structure.
But when you need it you can eliminate the labels leaving an optimised structure with all of its knots tied. You keep the original version for serialisation, but use the optimised version when your algorithms need it.
Here is some code:
import Debug.Trace
import Data.Map
data Tree a name = Leaf a | Fork (Tree a name) (Tree a name) | Name name deriving Show
instance Monad (Tree a) where
return x = Name x
Leaf a >>= f = Leaf a
Fork l r >>= f = Fork (l >>= f) (r >>= f)
Name a >>= f = f a
tie :: Map String (Tree Int String) -> Map String (Tree Int ())
tie tree = let result = fmap (>>= find) tree
find name = trace ("Looking up " ++ name) $ flip (findWithDefault (Leaf 0)) result name
in result
treerefs :: Map String (Tree Int String)
treerefs = ("root" `insert` Fork (Name "left") (Name "right")) $
("left" `insert` Fork (Leaf 1) (Name "root")) $
("right" `insert` Fork (Name "root") (Leaf 2)) $
empty
trees = tie treerefs
root = findWithDefault (Leaf 0) "root" trees
p (Leaf a) = "Leaf " ++ show a
p (Fork _ _) = "Fork"
p (Name n) = "Name " ++ show n
l (Fork a _) = a
r (Fork _ b) = b
test = p (r (r (r (l (r (l (l (r (l (r (r (l root))))))))))))
Notice how you can easily serialise treerefs
but you can traverse root
fast. I put the trace
call in so you can see how often name lookup takes place. It really does close the loop (in GHC at least).
More generically, instead of Tree
you can use any free monad generated by a functor and you can use the method here to speed up the knot tying. (My example above doesn't fully make use of the Monad
instance but I wanted to make the connection with that paper.) Compare also with loeb
.
If your data elements have unique identifiers (some papers use the term "names") then you can build a table of those identifiers and the ways that they relate to other identifiers. In OO implementations (and in the "observable sharing" described by ehird) this is done using the memory address of the object as an ID. Functional programming only has values, not objects with memory addresses, so you have to add the unique ID yourself when you build the structure. Then you can detect cycles by testing whether an ID belongs to the set of those already seen.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With