Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Capturing rules of graph using types in Scala, OCaml and Haskell

I am trying to describe a complex graph with many different types of nodes and edges which can only be connected to each other according to a set of rules. I would like these rules to be checked at compile time using the type system of the language. There are many different node and edge types in my real application.

I have easily created a simple example in Scala:

sealed trait Node {
  val name: String
}
case class NodeType1(override val name: String) extends Node
case class NodeType2(override val name: String) extends Node
case class NodeType3(override val name: String) extends Node

sealed trait Edge
case class EdgeType1(source: NodeType1, target: NodeType2) extends Edge
case class EdgeType2(source: NodeType2, target: NodeType1) extends Edge

object Edge {
  def edgeSource(edge: Edge): Node = edge match {
    case EdgeType1(src, _) => src
    case EdgeType2(src, _) => src
  }
}

object Main {
  def main(args: Array[String]) {
    val n1 = NodeType1("Node1")
    val n2 = NodeType2("Node2")
    val edge = EdgeType1(n1, n2)
    val source = Edge.edgeSource(edge)
    println(source == n1)  // true
  }
}

A valid graph can only connect a given edge type between the given types of nodes as shown in the Scala example above. The function "edgeSource" extracts the source node from the edge, as simple as that.

Here comes a non working example of what I would like to write in OCaml:

type node =
    NodeType1 of string
  | NodeType2 of string

type edge =
    EdgeType1 of NodeType1 * NodeType2
  | EdgeType2 of NodeType2 * NodeType1

let link_source (e : edge) : node =
  match e with
  | EdgeType1 (src, _) -> src
  | EdgeType2 (src, _) -> src

The problem here is that "NodeTypeX" are constructors and not types. Hence, I am unable to use them when I describe the tuples with source and target for which the edges are defined. The "link_source" function can only return one type and "node" is the variant which can return something.

I have been trying out how to fix this in both OCaml and Haskell and here is an example of one go in OCaml where the node type wraps node_type_X:

type node_type_1 = NodeType1 of string
type node_type_2 = NodeType2 of string

type node =
    NodeType1Node of node_type_1
  | NodeType2Node of node_type_2

type edge =
    EdgeType1 of node_type_1 * node_type_2
  | EdgeType2 of node_type_2 * node_type_1

let link_source (e : edge) : node =
  match e with
  | EdgeType1 (src, _) -> NodeType1Node src
  | EdgeType2 (src, _) -> NodeType2Node src

But the problem with this is that I am duplicating the type information. I am specifying the source node type in the definition of edge, and it is also given when matching the edge in link_source as NodeTypeXNode.

Obviously I am not understanding how to solve this problem. I am stuck thinking in class hierarchies. What would be the correct way to express what I am achieving in the Scala code above in OCaml or Haskell?

like image 286
Stefan Larsson Avatar asked Aug 14 '15 19:08

Stefan Larsson


3 Answers

Edit: the answer with GADTs is much more direct.

Here's a Haskell version (without unsafeCoerce), which is one possible translation of your Scala code. I can't help with an OCaml solution however.

Note that, in Haskell, == cannot be used on values of different type (and the ability to do so in Scala is frequently frowned upon and a source of annoyance and bugs). However, I've provided a solution below for comparing different node types if you really need it. If you don't truly need it, I'd recommend avoiding it, as it depends on GHC features/extensions that make your code less portable and could potentially even cause problems for the type checker.

WITHOUT polymorphic node comparison:

{-# LANGUAGE TypeFamilies, FlexibleContexts #-}
-- the FlexibleContexts extension can be eliminated
-- by removing the constraint on edgeSource.

-- let's start with just the data types
data NodeType1 = NodeType1 { name1 :: String } deriving Eq
data NodeType2 = NodeType2 { name2 :: String } deriving Eq
data NodeType3 = NodeType3 { name3 :: String } deriving Eq

data EdgeType1 = EdgeType1 { source1 :: NodeType1, target1 :: NodeType2 }
data EdgeType2 = EdgeType2 { source2 :: NodeType2, target2 :: NodeType1 }

-- you tell the compiler that the node types
-- somehow "belong together" by using a type class
class    Node a         where name :: a -> String
instance Node NodeType1 where name = name1
instance Node NodeType2 where name = name2
instance Node NodeType3 where name = name3

-- same about the edges, however in order to
-- map each Edge type to a different Node type,
-- you need to use TypeFamilies; see
-- https://wiki.haskell.org/GHC/Type_families
class Edge a where
  type SourceType a
  -- the constraint here isn't necessary to make
  -- the code compile, but it ensures you can't
  -- map Edge types to non-Node types.
  edgeSource :: Node (SourceType a) => a -> SourceType a

instance Edge EdgeType1 where
  type SourceType EdgeType1 = NodeType1
  edgeSource = source1

instance Edge EdgeType2 where
  type SourceType EdgeType2 = NodeType2
  edgeSource = source2

main = do
  let n1     = NodeType1 "Node1"
      n2     = NodeType2 "Node2"
      edge   = EdgeType1 n1 n2
      source = edgeSource edge
  print (source == n1)  -- True
--  print (source == n2)  -- False  -- DOESN'T COMPILE

WITH polymorphic node comparison:

{-# LANGUAGE MultiParamTypeClasses, FlexibleInstances #-}

-- again, constraint not required but makes sure you can't
-- define node equality for non-Node types.
class (Node a, Node b) => NodeEq a b where
  nodeEq :: a -> b -> Bool

-- I wasn't able to avoid OVERLAPPING/OVERLAPS here.
-- Also, if you forget `deriving Eq` for a node type N,
-- `nodeEq` justs yield False for any a, b :: N, without warning.
instance {-# OVERLAPPING #-} (Node a, Eq a)   => NodeEq a a where
  nodeEq = (==)
instance {-# OVERLAPPING #-} (Node a, Node b) => NodeEq a b where
  nodeEq _ _ = False

main = do
  let n1     = NodeType1 "Node1"
      n2     = NodeType2 "Node2"
      edge   = EdgeType1 n1 n2
      source = edgeSource edge
  print (source `nodeEq` n1)  -- True
  print (source `nodeEq` n2)  -- False

The abov is not the only way to tell the Haskell type system about your constraints, for example functional dependencies seem applicable, and GADTs.


Explanation:

It's worth understanding why the solution seems to be more direct in Scala.

Scala's a hybrid between subtype polymorphism based OO such as the one found in C++, Java/C#, Python/Ruby, and (often Haskell-like) functional programming, which typically avoids subtyping a.k.a. datatype inheritance, and resorts to other, arguably better, forms of polymorphism.

In Scala, the way you define ADTs is by encoding them as a sealed trait + a number of (potentially sealed) case classes and/or case objects. However, this is only a pure ADT only if you never refer to the types of the case objects and case classes, so as to pretend they are like the Haskell or ML ADTs. However, your Scala solution indeed uses those types, i.e. it points "into" the ADT.

There's no way to do that in Haskell as individual constructors of an ADT do not have a distinct type. Instead, if you need to type-distinguish between individual constructors of an ADT, you need to split the original ADT into separate ADTs, one per constructor of the original ADT. You then "group" these ADTs together, so as to be able to refer to all of them in your type signatures, by putting them in a type class, which is a form of ad-hoc polymorphism.

like image 196
Erik Kaplun Avatar answered Oct 19 '22 15:10

Erik Kaplun


I think the most straightforward translation of your Scala version is using phantom types to mark the node and edge type and bind them to specific constructors with GADTs.

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}

data Type = Type1 | Type2

data Edge t where
    EdgeType1 :: Node Type1 -> Node Type2 -> Edge Type1
    EdgeType2 :: Node Type2 -> Node Type1 -> Edge Type2

data Node t where
    NodeType1 :: String -> Node Type1
    NodeType2 :: String -> Node Type2

instance Eq (Node t) where
    NodeType1 a == NodeType1 b = a == b
    NodeType2 a == NodeType2 b = a == b

edgeSource :: Edge t -> Node t
edgeSource (EdgeType1 src _) = src
edgeSource (EdgeType2 src _) = src

main :: IO ()
main = do
    let n1   = NodeType1 "Node1"
        n2   = NodeType2 "Node2"
        edge = EdgeType1 n1 n2
        src  = edgeSource edge

    print $ src == n1

This is now actually safer than the Scala version since we know the exact type returned from edgeSource statically instead of just getting an abstract base class that we would need to type-cast or pattern match against.

If you want to mimic the Scala version exactly, you could hide the phantom type in an existential wrapper to return a generic, "unknown" Node from edgeSource.

{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE FlexibleInstances #-}

data Some t where
    Some :: t x -> Some t

edgeSource :: Edge t -> Some Node
edgeSource (EdgeType1 src _) = Some src
edgeSource (EdgeType2 src _) = Some src

label :: Node t -> String
label (NodeType1 l) = l
label (NodeType2 l) = l

instance Eq (Some Node) where
    Some n1 == Some n2 = label n1 == label n2
like image 5
shang Avatar answered Oct 19 '22 16:10

shang


You were asking too much of the Ocaml type system. At this point in your second attempt:

let link_source (e : edge) : node =
match e with
| EdgeType1 (src, _) -> 

you are saying: It should be clear that src is of node_type_1, and I gave the return type node, so the compiler should be able to sort out the correct constructor to use from the type of src. However this is not possible in general: In a given variant, there is not a unique mapping from 'member types' to constructors; for example: type a = A of int | B of int. So you do have to specify the constructor (you could name it shorter).

If you don't want that you have to use polymorphism. One option is make the src_link function polymorphic. One attempt would be

type e12 = node_type_1 * node_type_2
type e21 = node_type_2 * node_type_1

let link_source = fst

but then you have to expose the link types as tuples separately. Another option is to use polymorphic variants.

type node1 = [`N1 of string]
type node2 = [`N2 of string]
type node3 = [`N3 of string]
type node = [node1 | node2 | node3]
type edge = E12 of node1 * node2 | E21 of node2 * node1

then one could write

let link_source (e:edge) : [<node] = match e with
  | E12 (`N1 s, _) -> `N1 s 
  | E21 (`N2 s, _) -> `N2 s

this automaticallly unifies the return type and checks that it's an existing node. The last pattern match can also be handled by type coercion:

let link_source (e:edge) : node = match e with
  | E12 (n1, _) -> (n1:>node)
  | E21 (n2, _) -> (n2:>node)

GADTs can also help. With the same definitions for node{,1,2,3} above one can define

type ('a, 'b) edge =
  | E12 : node1 * node2 -> (node1, node2) edge
  | E21 : node2 * node1 -> (node2, node1) edge

and then a polymorphic

let link_source : type a b . (a, b) edge -> a = function
  | E12 (n1, _) -> n1
  | E21 (n2, _) -> n2

addendum: when using GADTs it's not necessary to use polymorphic variants. So, one can just have

type node1 = N1 of string
type node2 = N2 of string
type node3 = N3 of string

and the same definitions of edge and link_source will work.

like image 4
user3240588 Avatar answered Oct 19 '22 16:10

user3240588