Is there any recommended way to use typeclasses to emulate OCaml-like parametrized modules?
For an instance, I need the module that implements the complex generic computation, that may be parmetrized with different misc. types, functions, etc. To be more specific, let it be kMeans implementation that could be parametrized with different types of values, vector types (list, unboxed vector, vector, tuple, etc), and distance calculation strategy.
For convenience, to avoid crazy amount of intermediate types, I want to have this computation polymorphic by DataSet class, that contains all required interfaces. I also tried to use TypeFamilies to avoid a lot of typeclass parameters (that cause problems as well):
{-# Language MultiParamTypeClasses
, TypeFamilies
, FlexibleContexts
, FlexibleInstances
, EmptyDataDecls
, FunctionalDependencies
#-}
module Main where
import qualified Data.List as L
import qualified Data.Vector as V
import qualified Data.Vector.Unboxed as U
import Distances
-- contains instances for Euclid distance
-- import Distances.Euclid as E
-- contains instances for Kulback-Leibler "distance"
-- import Distances.Kullback as K
class ( Num (Elem c)
, Ord (TLabel c)
, WithDistance (TVect c) (Elem c)
, WithDistance (TBoxType c) (Elem c)
)
=> DataSet c where
type Elem c :: *
type TLabel c :: *
type TVect c :: * -> *
data TDistType c :: *
data TObservation c :: *
data TBoxType c :: * -> *
observations :: c -> [TObservation c]
measurements :: TObservation c -> [Elem c]
label :: TObservation c -> TLabel c
distance :: TBoxType c (Elem c) -> TBoxType c (Elem c) -> Elem c
distance = distance_
instance DataSet () where
type Elem () = Float
type TLabel () = Int
data TObservation () = TObservationUnit [Float]
data TDistType ()
type TVect () = V.Vector
data TBoxType () v = VectorBox (V.Vector v)
observations () = replicate 10 (TObservationUnit [0,0,0,0])
measurements (TObservationUnit xs) = xs
label (TObservationUnit _) = 111
kMeans :: ( Floating (Elem c)
, DataSet c
) => c
-> [TObservation c]
kMeans s = undefined -- here the implementation
where
labels = map label (observations s)
www = L.map (V.fromList.measurements) (observations s)
zzz = L.zipWith distance_ www www
wtf1 = L.foldl wtf2 0 (observations s)
wtf2 acc xs = acc + L.sum (measurements xs)
qq = V.fromList [1,2,3 :: Float]
l = distance (VectorBox qq) (VectorBox qq)
instance Floating a => WithDistance (TBoxType ()) a where
distance_ xs ys = undefined
instance Floating a => WithDistance V.Vector a where
distance_ xs ys = sqrt $ V.sum (V.zipWith (\x y -> (x+y)**2) xs ys)
This code somehow compiles and work, but it's pretty ugly and hacky.
The kMeans should be parametrized by value type (number, float point number, anything), box type (vector,list,unboxed vector, tuple may be) and distance calculation strategy.
There are also types for Observation (that's the type of sample provided by user, there should be a lot of them, measurements that contained in each observation).
So the problems are:
1) If the function does not contains the parametric types in it's signature, types will not be deduced
2) Still no idea, how to declare typeclass WithDistance to have different instances for different distance type (Euclid, Kullback, anything else via phantom types).
Right now WithDistance just polymorphic by box type and value type, so if we need different strategies, we may only put them in different modules and import the required module. But this is a hack and non-typed approach, right?
All of this may be done pretty easy in OCaml with is't modules. What the proper approach to implement such things in Haskell?
Typeclasses with TypeFamilies somehow look similar to parametric modules, but they work different. I really need something like that.
It is really the case that Haskell lacks useful features found in *ML module systems.
There is ongoing effort to extend Haskell's module system: http://plv.mpi-sws.org/backpack/
But I think you can get a bit further without those ML modules. Your design follows God class anti-pattern and that is why it is anti-modular.
Type class can be useful only if every type can have no more than a single instance of that class. E.g. DataSet ()
instance fixes type TVect () = V.Vector
and you can't easily create similar instance but with TVect = U.Vector
.
You need to start with implementing kMeans
function, then generalize it by replacing concrete types with type variables and constraining those type variables with type classes when needed.
Here is little example. At first you have some non-general implementation:
kMeans :: Int -> [(Double,Double)] -> [[(Double,Double)]]
kMeans k points = ...
Then you generalize it by distance calculation strategy:
kMeans
:: Int
-> ((Double,Double) -> (Double,Double) -> Double)
-> [(Double,Double)]
-> [[(Double,Double)]]
kMeans k distance points = ...
Now you can generalize it by type of points, but this requires introducing a class that will capture some properties of points that are used by distance computation e.g. getting list of coordinates:
kMeans
:: Point p
=> Int -> (p -> p -> Coord p) -> [p]
-> [[p]]
kMeans k distance points = ...
class Num (Coord p) => Point p where
type Coord p
coords :: p -> [Coord p]
euclidianDistance
:: (Point p, Floating (Coord p))
=> p -> p -> Coord p
euclidianDistance a b
= sum $ map (**2) $ zipWith (-) (coords a) (coords b)
Now you may wish to make it a bit faster by replacing lists with vectors:
kMeans
:: (Point p, DataSet vec p)
=> Int -> (p -> p -> Coord p) -> vec p
-> [vec p]
kMeans k distance points = ...
class DataSet vec p where
map :: ...
foldl' :: ...
instance Unbox p => DataSet U.Vector p where
map = U.map
foldl' = U.foldl'
And so on.
Suggested approach is to generalize various parts of algorithm and constrain those parts with small loosely coupled type classes (when required).
It is a bad style to collect everything in a single monolithic type class.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With