Basic question: what design principles should one follow when choosing between using a class or using a record (with polymorphic fields) ?
First, we know that classes and records are essentially equivalent (since in Core, classes get desugared to dictionaries, which are just records). Nevertheless, there are differences: classes are passed implicitly, records must be explicit.
Looking a little deeper, classes are really useful when:
Classes are awkward when we have (up to parametric polymorphism) only one representation of our data, but we have multiple instances. This leads to the syntactic noise of having to use newtype to add extra tags (which exist only in our code, as we know such tags get erased at run time) if we don't want to turn on all sorts of troublesome extensions (i.e. overlapping and/or undecidable instances).
Of course, things get muddier: what if I want to have constraints on my types? Let's pick a real example:
class (Bounded i, Enum i) => Partition a i where index :: a -> i
I could just as easily have done
data Partition a i = Partition { index :: a -> i}
But now I've lost my constraints, and I will have to add them to specific functions instead.
Are there design guidelines that would help me out?
Inheritance. This section only applies to record class types. A record can inherit from another record. However, a record can't inherit from a class, and a class can't inherit from a record.
A final class is simply one that cannot be extended. But that imposes no other constraints on the class; it can still have mutable fields, fully encapsulate its state, etc. A record is a transparent carrier for a given tuple of state components, and is required to expose an API derived from its state description.
You create record types when you want value-based equality and comparison, don't want to copy values, and want to use reference variables. You create record struct types when you want the features of records for a type that is small enough to copy efficiently.
It's legal to implement an interface with a record.
I tend to see no issue with only requiring constraints on functions. The issue is, I suppose, that your data structure no longer models precisely what you intend it to. On the other hand, if you think of it as a data structure first and foremost, then that should matter less.
I feel like I don't necessarily still have a good grasp on the question, and this is about as vague as can be, but my rule of thumb tends to be that typeclasses are things that obey laws (or model meaning), and datatypes are things that encode a certain quantity of information.
When we want to layer behavior in complex ways, I've found that typeclasses start off enticingly, but can get painful quickly and switching to dictionary-passing makes things more straightforward. Which is to say that when we want implementations to be interoperable, then we should fall back to a uniform dictionary type.
This is take two, expanding a bit on a concrete example, but still just sort of spinning ideas...
Suppose we want to model probability distributions over the reals. Two natural representations come to mind.
A) Typeclass-driven
class PDist a where sample :: a -> Gen -> Double
B) Dictionary-driven
data PDist = PDist (Gen -> Double)
The former lets us do
data NormalDist = NormalDist Double Double -- mean, var instance PDist NormalDist where... data LognormalDist = LognormalDist Double Double instance PDist LognormalDist where...
The latter lets us do
mkNormalDist :: Double -> Double -> PDist... mkLognormalDist :: Double -> Double -> PDist...
In the former, we can write
data SumDist a b = SumDist a b instance (PDist a, PDist b) => PDist (SumDist a b)...
in the latter we can simply write
sumDist :: PDist -> PDist -> PDist
So what are the tradeoffs? Typeclass-driven lets us specify what distributions we're given. The tradeoff is that we have to construct an algebra of distributions explicitly, including new types for their combinations. Data-driven doesn't let us restrict the distributions we're given (or even if they're well-formed) but in return we can do whatever the heck we want.
Furthermore we can write a parseDist :: String -> PDist
relatively easily, but we have to go through some angst to do the equiv for the typeclass approach.
So this is, in a sense the typed/untyped static/dynamic tradeoff at another level. We can give it a twist though, and argue that the typeclass, along with associated algebraic laws, specifies the semantics of a probability distribution. And the PDist type can indeed be made an instance of the PDist typeclass. Meanwhile, we can resign ourselves to using the PDist type (rather than typeclass) nearly everywhere, while thinking of it as iso to the tower of instances and datatypes necessary to use the typeclass more "richly."
In fact, we can even define basic PDist function in terms of typeclass functions. i.e. mkNormalPDist m v = PDist (sample $ NormalDist m v)
So there's lots of room in the design space to slide between the two representations as necessary...
Note: I'm not sure that I understand the OP exactly. Suggestions/comments for improvement appreciated!
Background:
When I first learned about typeclasses in Haskell, the general rule-of-thumb I picked up was that, in comparison to Java-like languages:
data
are similar to classesHere's another SO question and answer that describe guidelines for using interfaces (also some drawbacks of interface over-use). My interpretation:
I bet you already know all this.
The guidelines I try to follow for my own code are:
So in practice this means:
Example:
typeclass Show
, with function show :: (Show s) => s -> String
: for data that can be represented as a String
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With