Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OCaml functors (parametrized modules) emulation in Haskell

Is there any recommended way to use typeclasses to emulate OCaml-like parametrized modules?

For an instance, I need the module that implements the complex generic computation, that may be parmetrized with different misc. types, functions, etc. To be more specific, let it be kMeans implementation that could be parametrized with different types of values, vector types (list, unboxed vector, vector, tuple, etc), and distance calculation strategy.

For convenience, to avoid crazy amount of intermediate types, I want to have this computation polymorphic by DataSet class, that contains all required interfaces. I also tried to use TypeFamilies to avoid a lot of typeclass parameters (that cause problems as well):

{-# Language MultiParamTypeClasses
           , TypeFamilies
           , FlexibleContexts
           , FlexibleInstances
           , EmptyDataDecls
           , FunctionalDependencies
           #-}

module Main where

import qualified Data.List as L
import qualified Data.Vector as V
import qualified Data.Vector.Unboxed as U

import Distances
-- contains instances for Euclid distance
-- import Distances.Euclid as E
-- contains instances for Kulback-Leibler "distance"
-- import Distances.Kullback as K

class ( Num (Elem c)
     ,  Ord (TLabel c)
     ,  WithDistance (TVect c) (Elem c)
     ,  WithDistance  (TBoxType c) (Elem c)
     ) 
     => DataSet c  where
    type Elem c  ::  *
    type TLabel c ::  *
    type TVect c :: * -> *
    data TDistType  c :: *
    data TObservation c :: *
    data TBoxType c :: * -> *
    observations :: c -> [TObservation c]
    measurements :: TObservation c -> [Elem c]
    label        :: TObservation c -> TLabel c
    distance :: TBoxType c (Elem c) -> TBoxType c (Elem c) -> Elem c
    distance = distance_

instance DataSet () where
  type Elem () = Float
  type TLabel () = Int
  data TObservation () = TObservationUnit [Float]
  data TDistType ()
  type TVect () = V.Vector 
  data TBoxType () v = VectorBox (V.Vector v)
  observations () = replicate 10 (TObservationUnit [0,0,0,0])
  measurements (TObservationUnit xs) = xs
  label (TObservationUnit _) = 111 

kMeans :: ( Floating (Elem c)
          , DataSet c
          ) => c
            -> [TObservation c]
kMeans s = undefined -- here the implementation
  where
    labels = map label (observations s)
    www  = L.map (V.fromList.measurements) (observations s)
    zzz  = L.zipWith distance_ www www
    wtf1 = L.foldl wtf2 0 (observations s)
    wtf2 acc xs = acc + L.sum (measurements xs)
    qq = V.fromList [1,2,3 :: Float]
    l = distance (VectorBox qq) (VectorBox qq)

instance Floating a => WithDistance (TBoxType ()) a where
  distance_ xs ys = undefined

instance Floating a => WithDistance V.Vector a where
  distance_  xs ys = sqrt $ V.sum (V.zipWith (\x y -> (x+y)**2) xs ys)

This code somehow compiles and work, but it's pretty ugly and hacky.

The kMeans should be parametrized by value type (number, float point number, anything), box type (vector,list,unboxed vector, tuple may be) and distance calculation strategy.

There are also types for Observation (that's the type of sample provided by user, there should be a lot of them, measurements that contained in each observation).

So the problems are:

1) If the function does not contains the parametric types in it's signature, types will not be deduced

2) Still no idea, how to declare typeclass WithDistance to have different instances for different distance type (Euclid, Kullback, anything else via phantom types).

Right now WithDistance just polymorphic by box type and value type, so if we need different strategies, we may only put them in different modules and import the required module. But this is a hack and non-typed approach, right?

All of this may be done pretty easy in OCaml with is't modules. What the proper approach to implement such things in Haskell?

Typeclasses with TypeFamilies somehow look similar to parametric modules, but they work different. I really need something like that.

like image 692
voidlizard Avatar asked Oct 12 '14 06:10

voidlizard


1 Answers

It is really the case that Haskell lacks useful features found in *ML module systems.
There is ongoing effort to extend Haskell's module system: http://plv.mpi-sws.org/backpack/

But I think you can get a bit further without those ML modules. Your design follows God class anti-pattern and that is why it is anti-modular.

Type class can be useful only if every type can have no more than a single instance of that class. E.g. DataSet () instance fixes type TVect () = V.Vector and you can't easily create similar instance but with TVect = U.Vector.

You need to start with implementing kMeans function, then generalize it by replacing concrete types with type variables and constraining those type variables with type classes when needed.

Here is little example. At first you have some non-general implementation:

kMeans :: Int -> [(Double,Double)] -> [[(Double,Double)]]
kMeans k points = ...

Then you generalize it by distance calculation strategy:

kMeans
   :: Int
   -> ((Double,Double) -> (Double,Double) -> Double)
   -> [(Double,Double)]
   -> [[(Double,Double)]]
kMeans k distance points = ...

Now you can generalize it by type of points, but this requires introducing a class that will capture some properties of points that are used by distance computation e.g. getting list of coordinates:

kMeans
    :: Point p
    => Int -> (p -> p -> Coord p) -> [p]
    -> [[p]]
kMeans k distance points = ...

class Num (Coord p) => Point p where
    type Coord p
    coords :: p -> [Coord p]

euclidianDistance
    :: (Point p, Floating (Coord p))
    => p -> p -> Coord p
euclidianDistance a b
    = sum $ map (**2) $ zipWith (-) (coords a) (coords b)

Now you may wish to make it a bit faster by replacing lists with vectors:

kMeans
    :: (Point p, DataSet vec p)
    => Int -> (p -> p -> Coord p) -> vec p
    -> [vec p]
kMeans k distance points = ...

class DataSet vec p where
  map :: ...
  foldl' :: ...

instance Unbox p => DataSet U.Vector p where
  map = U.map
  foldl' = U.foldl'

And so on.

Suggested approach is to generalize various parts of algorithm and constrain those parts with small loosely coupled type classes (when required).
It is a bad style to collect everything in a single monolithic type class.

like image 105
max taldykin Avatar answered Oct 13 '22 07:10

max taldykin