In a project I'm working on, data of a certain type may sometimes contain themselves in it. For example,
data Example = Apple Int
| Pear Int Example
a = Pear 10 a
b = Pear 10 b
As a programmer I know that a
and b
are equal, but when I actually test equality between them it would infinite loop because their values need to be evaluated for comparison.
Is there any other way to do equality testing between data like these? Or is there a way to avoid problems like these?
I'd like to talk a little bit about how to really do this right, if this is a thing you want to do. It's harder than you might guess at first -- and easier than you might guess at second! For fun, I'm going to make the problem slightly harder; we'll eventually be representing the values
a = Pear 10 a
b = Pear 10 (Pear 10 b)
c = Apple 10
and computing that a
and b
are "equal" -- for a precise sense of equal we'll discuss below. This is not a thing that observable sharing will necessarily give you for free.
The precise sense of equality that we'll use in the sequel is bisimilarity. Normally, bisimilarity -- and its close relative, bisimulation -- are presented as relations between two labelled graphs. For our purposes, you should be picturing here nodes in the graph containing some data at the current constructor of our data type, and edges as pointing to subterms. Thus, for the value Pear 10 (Pear 20 (Apple 30))
, we might have the graph Pear 10 -> Pear 20 -> Apple 30
; and the value b
above would be the cyclic graph
Pear 10 -> Pear 10
^ |
\_______/
Observable sharing will you get you this far, but will not get you to the point where it's obvious how to decide that these two graphs are equivalent:
Pear 10 -. Pear 10 -> Pear 10
^ | ~= ^ |
\____/ \_______/
If you're familiar with an algorithm used for minimizing DFAs, you can probably stop here; such algorithms are easy to adapt for the purpose of testing equality of regular languages, and that's essentially what we'll be doing below.
The key insight is that all three nodes in the two graphs above behave essentially the same -- any path you can take starting from a node in the left graph has an "identical" path in the right graph. That is, suppose we have a relation R between nodes in the left and right graphs satisfying this property:
if nl R nr
and there is an edge (nl, e, nl') in the graph,
then there is an edge (nr, e, nr') in the graph
and nl' R nr'
We'll call R a bisimulation. The largest relation R will be called a bisimilarity. If the "root" nodes in two graphs are bisimilar, then the corresponding Haskell values are equal! At this point, I hope you have gotten to the point where the problem seems harder than you guessed at first; perhaps impossible. After all, how are we supposed to get our hands on the largest such relation?
One answer is to start with the complete relation and cut out any pairs of nodes that violate the above constraints. Keep iterating that process until nothing changes, and see what we've got left. It turns out you can prove that this process actually produces the bisimilarity! We'll implement this below in a pretty naive way; you can google about for efficient bisimilarity algorithms if you want to have more speed.
First a preamble. We'll use the fgl package for our graph representation.
import Control.Monad.Reader
import Data.Graph.Inductive hiding (ap)
import Data.Map (Map)
import Data.Set (Set)
import qualified Data.Map as M
import qualified Data.Set as S
The fgl package defines a type Node
for the identities of nodes. We'll represent our relation simply as
type Relation = Set (Node, Node)
To begin with, we want the complete relation for a pair of graphs. While we're at it, we might as well cut out any pairs whose node labels don't match right away. (A note on naming conventions I chose: in fgl, every node and edge has a label -- which can be of any type you like -- and an identity -- which must be of type Node
or Edge
. Our names will reflect this when possible: an n
prefix for nodes, e
for edges, i
for identity, and v
for label/value. We'll use l
and r
as a suffix for our left-hand and right-hand graphs.)
labeledPairs :: (Eq n, Graph gr) => gr n e -> gr n e' -> Relation
labeledPairs l r = S.fromList
[ (nil, nir)
| (nil, nvl) <- labNodes l
, (nir, nvr) <- labNodes r
, nvl == nvr
]
Now, the next piece is to check whether two nodes satisfy the "single-step relatedness" condition we described above. That is, for each edge out of one of the nodes, we're looking for an edge out of the other with the same label and leading to another node that we currently claim is related. Transliterating this search to Haskell:
-- assumption: nil and nir are actual nodes in graphs l and r, respectively
ssRelated :: (Ord e, Graph gr) => gr n e -> gr n e -> Relation -> Node -> Node -> Bool
ssRelated l r rel nil nir = rell && relr where
el = out l nil
er = out r nil
mel = M.fromListWith (++) [(evl, [nil]) | (_, nil, evl) <- el]
mer = M.fromListWith (++) [(evr, [nir]) | (_, nir, evr) <- er]
rell = and
[ or [(nil, nir) `S.member` rel | nir <- M.findWithDefault [] evl mer]
| (_, nil, evl) <- el
]
relr = and
[ or [(nil, nir) `S.member` rel | nil <- M.findWithDefault [] evr mel]
| (_, nir, evr) <- er
]
We can now write a function which checks each pair of nodes for single-step suitability:
prune :: (Ord e, Graph gr) => gr n e -> gr n e -> Relation -> Relation
prune l r rel = S.filter (uncurry (ssRelated l r rel)) rel
To compute the bisimilarity, as said above, we'll start with the complete relation and repeatedly prune away nodes that don't fit the criteria.
bisimilarity :: (Eq n, Ord e, Graph gr) => gr n e -> gr n e -> Relation
bisimilarity l r
= fst . head
. dropWhile (uncurry (/=))
. ap zip tail
. iterate (prune l r)
$ labeledPairs l r
Now we can check whether two graphs have the same infinite unrolling by picking root nodes in each graph and checking them for bisimilarity:
-- assumption: the root of the graph is node 0
bisimilar :: (Eq n, Ord e, Graph gr) => gr n e -> gr n e -> Bool
bisimilar l r = (0, 0) `S.member` bisimilarity l r
Now let's see it in action! We'll make analogs of a
, b
, and c
from earlier in the answer in the graph representation. Since our data type only ever has one possible recursive occurrence, we don't need interesting edge labels. The mkGraph
function takes a list of labeled nodes and a list of labeled edges and builds a graph out of them.
data NodeLabel = Apple Int | Pear Int
deriving (Eq, Ord, Read, Show)
type EdgeLabel = ()
a, b, c :: Gr NodeLabel EdgeLabel
a = mkGraph [(0, Pear 10)] [(0, 0, ())]
b = mkGraph [(0, Pear 10), (1, Pear 10)] [(0, 1, ()), (1, 0, ())]
c = mkGraph [(0, Apple 10)] []
In ghci:
*Main> bisimilar a b
True
*Main> bisimilar a c
False
*Main> bisimilar b c
False
*Main> bisimilar a a
True
*Main> bisimilar b b
True
*Main> bisimilar c c
True
Looks good! Making it fast and hooking it up to an observable-sharing library are left as an exercise to the reader. And keep in mind that, though this method can handle graphs with infinite unrollings, of course one would probably have trouble handling infinite graphs in this way!
In general: no. This sort of equality testing reduces to the halting problem. One way to see this is that we could express the execution of a Turing machine on some input as an infinite data structure like this. Another way of looking at it is that lazy data structures can represent arbitrary suspended computations.
The only real way to avoid problems like this is to either set some additional limitations on your data structure or check for something more limited than equality.
An example of the first approach would be to make cycles in your data type explicit with references of some sort, letting you detect them as you go along. This will limit exactly what you can express with your data structure but will also let your equality predicate detect cycles robustly. I think you could do this with observable sharing; take a look at data-reify for a package that does this. This approach should make checking a very directly recursive structure like your example easy.
For the second approach, you could have a function that returns a Maybe Bool
: if it can't figure out whether the two structures are equal or not in X steps, it returns Nothing
. Depending on how your data types are created, you could probably ensure that any types beyond a certain size with the same prefix are almost always equal and just rely on that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With