I was revisiting a piece of code I wrote to do combinatorial search a few months ago, and noticed that there was an alternative, simpler way to do something that I'd previously achieved with a type class. Specifically, I previously had a type class for the type of search problems, which have an states of type <code>s</code>, actions (operations on states) of type <code>a</code>, an initial state, a way of getting a list of (action,state) pairs and a way of testing whether a state is a solution or not: <pre class="prettyprint"><code>class Problem p s a where initial :: p s a -> s successor :: p s a -> s -> [(a,s)] goaltest :: p s a -> s -> Bool </code></pre> This is somewhat unsatisfactory, as it requires the MultiParameterTypeClass extension, and generally needs FlexibleInstances and possibly TypeSynonymInstances when you want to make instances of this class. It also clutters up your function signatures, e.g. <pre class="prettyprint"><code>pathToSolution :: Problem p => p s a -> [(a,s)] </code></pre> I noticed today that I can get rid of the class entirely, and use a type instead, along the following lines <pre class="prettyprint"><code>data Problem s a { initial :: s, successor :: s -> [(a,s)], goaltest :: s -> Bool } </code></pre> This doesn't require any extensions, the function signatures look nicer: <pre class="prettyprint"><code>pathToSolution :: Problem s a -> [(a,s)] </code></pre> and, most importantly, I found that after refactoring my code to use this abstraction instead of a type class, I was left with 15-20% fewer lines than I had previously. The biggest win was in code that created abstractions using the type class - previously I had to create new data structures that wrapped the old ones in a complicated way, and then make them into instances of the <code>Problem</code> class (which required more language extensions) - lots of lines of code to do something relatively simple. After the refactor, I just had a couple of functions that did exactly what I wanted to. I'm now looking through the rest of the code, trying to spot instances where I can replace type classes with types, and make more wins. My question is: in what situation does will this refactoring not work? In what cases is it actually just better to use a type class rather than a data type, and how can you recognise those situations ahead of time, so you don't have to go through a costly refactoring?

Consider a situation where both the type and class exist in the same program. The type can be an instance of the class, but that's rather trivial. More interesting is that you can write a function <code>fromProblemClass :: (CProblem p s a) => p s a -> TProblem s a</code>. The refactoring you performed is roughly equivalent to manually inlining <code>fromProblemClass</code> everywhere you construct something used as a <code>CProblem</code> instance, and making every function that accepts a <code>CProblem</code> instance instead accept <code>TProblem</code>. Since the only interesting parts of this refactoring are the definition of <code>TProblem</code> and the implementation of <code>fromProblemClass</code>, if you can write a similar type and function for any other class, you can likewise refactor it to eliminate the class entirely. <h3>When does this work?</h3> Think about the implementation of <code>fromProblemClass</code>. You'll essentially be partially applying each function of the class to a value of the instance type, and in the process eliminating any reference to the <code>p</code> parameter (which is what the type replaces). Any situation where refactoring away a type class is straightforward is going to follow a similar pattern. <h3>When is this counterproductive?</h3> Imagine a simplified version of <code>Show</code>, with only the <code>show</code> function defined. This permits the same refactoring, applying <code>show</code> and replacing each instance with... a <code>String</code>. Clearly we've lost something here--namely, the ability to work with the original types and convert them to a <code>String</code> at various points. The value of <code>Show</code> is that it's defined on a wide variety of unrelated types. As a rule of thumb, if there are many different functions specific to the types which are instances of the class, and these are often used in the same code as the class functions, delaying the conversion is useful. If there's a sharp dividing line between code that treats the types individually and code that uses the class, conversion functions might be more appropriate with a type class being a minor syntactic convenience. If the types are used almost exclusively through the class functions, the type class is probably completely superfluous. <h3>When is this impossible?</h3> Incidentally, the refactoring here is similar to the difference between a class and interface in OO languages; similarly, the type classes where this refactoring is impossible are those which can't be expressed directly at all in many OO languages. More to the point, some examples of things you can't translate easily, if at all, in this manner: <ul> <li>The class's type parameter appearing only in covariant position, such as the result type of a function or as a non-function value. Notable offenders here are <code>mempty</code> for <code>Monoid</code> and <code>return</code> for <code>Monad</code>.</li> <li>The class's type parameter appearing more than once in a function's type may not make this truly impossible but it complicates matters quite severely. Notable offenders here include <code>Eq</code>, <code>Ord</code>, and basically every numeric class.</li> <li>Non-trivial use of higher kinds, the specifics of which I'm not sure how to pin down, but <code>(>>=)</code> for <code>Monad</code> is a notable offender here. On the other hand, the <code>p</code> parameter in your class is not an issue.</li> <li>Non-trivial use of multi-parameter type classes, which I'm also uncertain how to pin down and gets horrendously complicated in practice anyway, being comparable to multiple dispatch in OO languages. Again, your class doesn't have an issue here.</li> </ul> Note that, given the above, this refactoring is not even possible for many of the standard type classes, and would be counterproductive for the few exceptions. This is not a coincidence. :] <h3>What do you give up by applying this refactoring?</h3> You give up the ability to distinguish between the original types. This sounds obvious, but it's potentially significant--if there are any situations where you really need to control which of the original class instance types was used, applying this refactoring loses some degree of type safety, which you can only recover by jumping through the same sort of hoops used elsewhere to ensure invariants at run-time. Conversely, if there are situations where you really need to make the various instance types interchangeable--the convoluted wrapping you mentioned being a classic symptom of this--you gain a great deal by throwing away the original types. This is most often the case where you don't actually care much about the original data itself, but rather about how it lets you operate on other data; thus using records of functions directly is more natural than an extra layer of indirection. As noted above, this relates closely to OOP and the type of problems it's best suited to, as well as representing the "other side" of the Expression Problem from what's typical in ML-style languages.

When to use a type class, when to use a type

Tags:

types

haskell

typeclass

I was revisiting a piece of code I wrote to do combinatorial search a few months ago, and noticed that there was an alternative, simpler way to do something that I'd previously achieved with a type class.

Specifically, I previously had a type class for the type of search problems, which have an states of type s, actions (operations on states) of type a, an initial state, a way of getting a list of (action,state) pairs and a way of testing whether a state is a solution or not:

class Problem p s a where     initial   :: p s a -> s     successor :: p s a -> s -> [(a,s)]     goaltest  :: p s a -> s -> Bool

This is somewhat unsatisfactory, as it requires the MultiParameterTypeClass extension, and generally needs FlexibleInstances and possibly TypeSynonymInstances when you want to make instances of this class. It also clutters up your function signatures, e.g.

pathToSolution :: Problem p => p s a -> [(a,s)]

I noticed today that I can get rid of the class entirely, and use a type instead, along the following lines

data Problem s a {     initial   :: s,     successor :: s -> [(a,s)],     goaltest  :: s -> Bool }

This doesn't require any extensions, the function signatures look nicer:

pathToSolution :: Problem s a -> [(a,s)]

and, most importantly, I found that after refactoring my code to use this abstraction instead of a type class, I was left with 15-20% fewer lines than I had previously.

The biggest win was in code that created abstractions using the type class - previously I had to create new data structures that wrapped the old ones in a complicated way, and then make them into instances of the Problem class (which required more language extensions) - lots of lines of code to do something relatively simple. After the refactor, I just had a couple of functions that did exactly what I wanted to.

I'm now looking through the rest of the code, trying to spot instances where I can replace type classes with types, and make more wins.

My question is: in what situation does will this refactoring not work? In what cases is it actually just better to use a type class rather than a data type, and how can you recognise those situations ahead of time, so you don't have to go through a costly refactoring?

373

asked Sep 05 '12 16:09

Chris Taylor

1 Answers

Consider a situation where both the type and class exist in the same program. The type can be an instance of the class, but that's rather trivial. More interesting is that you can write a function fromProblemClass :: (CProblem p s a) => p s a -> TProblem s a.

The refactoring you performed is roughly equivalent to manually inlining fromProblemClass everywhere you construct something used as a CProblem instance, and making every function that accepts a CProblem instance instead accept TProblem.

Since the only interesting parts of this refactoring are the definition of TProblem and the implementation of fromProblemClass, if you can write a similar type and function for any other class, you can likewise refactor it to eliminate the class entirely.

When does this work?

Think about the implementation of fromProblemClass. You'll essentially be partially applying each function of the class to a value of the instance type, and in the process eliminating any reference to the p parameter (which is what the type replaces).

Any situation where refactoring away a type class is straightforward is going to follow a similar pattern.

When is this counterproductive?

Imagine a simplified version of Show, with only the show function defined. This permits the same refactoring, applying show and replacing each instance with... a String. Clearly we've lost something here--namely, the ability to work with the original types and convert them to a String at various points. The value of Show is that it's defined on a wide variety of unrelated types.

As a rule of thumb, if there are many different functions specific to the types which are instances of the class, and these are often used in the same code as the class functions, delaying the conversion is useful. If there's a sharp dividing line between code that treats the types individually and code that uses the class, conversion functions might be more appropriate with a type class being a minor syntactic convenience. If the types are used almost exclusively through the class functions, the type class is probably completely superfluous.

When is this impossible?

Incidentally, the refactoring here is similar to the difference between a class and interface in OO languages; similarly, the type classes where this refactoring is impossible are those which can't be expressed directly at all in many OO languages.

More to the point, some examples of things you can't translate easily, if at all, in this manner:

The class's type parameter appearing only in covariant position, such as the result type of a function or as a non-function value. Notable offenders here are mempty for Monoid and return for Monad.
The class's type parameter appearing more than once in a function's type may not make this truly impossible but it complicates matters quite severely. Notable offenders here include Eq, Ord, and basically every numeric class.
Non-trivial use of higher kinds, the specifics of which I'm not sure how to pin down, but (>>=) for Monad is a notable offender here. On the other hand, the p parameter in your class is not an issue.
Non-trivial use of multi-parameter type classes, which I'm also uncertain how to pin down and gets horrendously complicated in practice anyway, being comparable to multiple dispatch in OO languages. Again, your class doesn't have an issue here.

Note that, given the above, this refactoring is not even possible for many of the standard type classes, and would be counterproductive for the few exceptions. This is not a coincidence. :]

What do you give up by applying this refactoring?

You give up the ability to distinguish between the original types. This sounds obvious, but it's potentially significant--if there are any situations where you really need to control which of the original class instance types was used, applying this refactoring loses some degree of type safety, which you can only recover by jumping through the same sort of hoops used elsewhere to ensure invariants at run-time.

Conversely, if there are situations where you really need to make the various instance types interchangeable--the convoluted wrapping you mentioned being a classic symptom of this--you gain a great deal by throwing away the original types. This is most often the case where you don't actually care much about the original data itself, but rather about how it lets you operate on other data; thus using records of functions directly is more natural than an extra layer of indirection.

As noted above, this relates closely to OOP and the type of problems it's best suited to, as well as representing the "other side" of the Expression Problem from what's typical in ML-style languages.

200

answered Oct 12 '22 03:10

C. A. McCann

Related questions
                            
                                Goto in Haskell: Can anyone explain this seemingly insane effect of continuation monad usage?
                            
                                Do Hask or Agda have equalisers?
                            
                                What does an escaped ampersand mean in Haskell?
                            
                                Why shouldn't I mix tabs and spaces?
                            
                                composing two comparison functions?
                            
                                Explicitly import instances
                            
                                How are Haskell programs compiled and executed internally?
                            
                                What are lenses used/useful for?
                            
                                Why can't I define a new type in ghci?
                            
                                Pattern Matching - Prolog vs. Haskell
                            
                                Have you used Quickcheck in a real project [closed]
                            
                                "Illegal instance declaration" when declaring instance of IsString
                            
                                the behavior of "const id"
                            
                                What exactly makes Option a monad in Scala?
                            
                                What are some examples of type-level programming? [closed]
                            
                                Replace individual list elements in Haskell?
                            
                                Fastest way to get the last element of a list in Haskell
                            
                                Debugging a memory leak that doesn't show on heap profiling
                            
                                What is Haskell missing for totality checking?
                            
                                When should one use a Kleisli?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With