Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

QuickCheck: How to use exhaustiveness checker to prevent forgotten constructors of a sum type

I have a Haskell data type like

data Mytype
  = C1
  | C2 Char
  | C3 Int String

If I case on a Mytype and forget to handle one of the cases, GHC gives me a warning (exhaustiveness check).

I now want to write a QuickCheck Arbitrary instance to generate MyTypes like:

instance Arbitrary Mytype where
  arbitrary = do
    n <- choose (1, 3 :: Int)
    case n of
      1 -> C1
      2 -> C2 <$> arbitrary
      3 -> C3 <$> arbitrary <*> someCustomGen

The problem with this is that I can add a new alternative to Mytype and forget to update the Arbitrary instance, thus having my tests not test that alternative.

I would like to find a way of using GHC's exhaustiveness checker to remind me of forgotten cases in my Arbitrary instance.

The best I've come up with is

arbitrary = do
  x <- elements [C1, C2 undefined, C3 undefined undefined]
  case x of
    C1     -> C1
    C2 _   -> C2 <$> arbitrary
    C3 _ _ -> C3 <$> arbitrary <*> someCustomGen

But it doesn't really feel elegant.

I intuitively feel that there's no 100% clean solution to this, but would appreciate anything that reduces the chance of forgetting such cases - especially in a big project where code and tests are separated.

like image 823
nh2 Avatar asked Aug 27 '14 19:08

nh2


3 Answers

I implemented a solution with TemplateHaskell, you can find a prototype at https://gist.github.com/nh2/d982e2ca4280a03364a8. With this you can write:

instance Arbitrary Mytype where
  arbitrary = oneof $(exhaustivenessCheck ''Mytype [|
      [ pure C1
      , C2 <$> arbitrary
      , C3 <$> arbitrary <*> arbitrary
      ]
    |])

It works like this: You give it a type name (like ''Mytype) and an expression (in my case a list of arbitrary style Gens). It gets the list of all constructors for that type name and checks whether the expression contains all of these constructors at least once. If you just added a constructor but forgot to add it to the Arbitrary instance, this function will warn you at compile time.

This is how it's implemented with TH:

exhaustivenessCheck :: Name -> Q Exp -> Q Exp
exhaustivenessCheck tyName qList = do
  tyInfo <- reify tyName
  let conNames = case tyInfo of
        TyConI (DataD _cxt _name _tyVarBndrs cons _derives) -> map conNameOf cons
        _ -> fail "exhaustivenessCheck: Can only handle simple data declarations"

  list <- qList
  case list of
    input@(ListE l) -> do
      -- We could be more specific by searching for `ConE`s in `l`
      let cons = toListOf tinplate l :: [Name]
      case filter (`notElem` cons) conNames of
        [] -> return input
        missings -> fail $ "exhaustivenessCheck: missing case: " ++ show missings
    _ -> fail "exhaustivenessCheck: argument must be a list"

I'm using GHC.Generics to easily traverse the syntax tree of the Exp: With toListOf tinplate exp :: [Name] (from lens) I can easily find all Names in the whole exp.

I was surprised that the types from Language.Haskell.TH do not have Generic instances, and neither (with current GHC 7.8) do Integer or Word8 - Generic instances for these are required because they appear in Exp. So I added them as orphan instances (for most things, StandaloneDeriving does it but for primitive types like Integer I had to copy-paste instances as Int has them).

The solution is not perfect because it doesn't use the exhaustiveness checker like case does, but as we agree, that's not possible while staying DRY, and this TH solution is DRY.

One possible improvement/alternative would be to write a TH function that does this check for all Arbitrary instances in a whole module at once instead of calling exhaustivenessCheck inside each Arbitrary instance.

like image 63
nh2 Avatar answered Nov 17 '22 03:11

nh2


You want to ensure that your code behaves in a particular way; the simplest way to check the behaviour of code is to test it.

In this case, the desired behaviour is that each constructor gets reasonable coverage in tests. We can check that with a simple test:

allCons xs = length xs > 100 ==> length constructors == 3
             where constructors = nubBy eqCons xs
                   eqCons  C1       C1      = True
                   eqCons  C1       _       = False
                   eqCons (C2 _)   (C2 _)   = True
                   eqCons (C2 _)    _       = False
                   eqCons (C3 _ _) (C3 _ _) = True
                   eqCons (C3 _ _)  _       = False

This is pretty naive, but it's a good first shot. Its advantages:

  • eqCons will trigger an exhaustiveness warning if new constructors are added, which is what you want
  • It checks that your instance is handling all constructors, which is what you want
  • It also checks that all constructors are actually generated with some useful probability (in this case at least 1%)
  • It also checks that your instance is usable, eg. doesn't hang

Its disadvantages:

  • Requires a large amount of test data, in order to filter out those with length > 100
  • eqCons is quite verbose, since a catch-all eqCons _ _ = False would bypass the exhaustiveness check
  • Uses magic numbers 100 and 3
  • Not very generic

There are ways to improve this, eg. we can compute the constructors using the Data.Data module:

allCons xs = sufficient ==> length constructors == consCount
             where sufficient   = length xs > 100 * consCount
                   constructors = length . nub . map toConstr $ xs
                   consCount    = dataTypeConstrs (head xs)

This loses the compile-time exhaustiveness check, but it's redundant as long as we test regularly and our code has become more generic.

If we really want the exhaustiveness check, there are a few places where we could shoe-horn it back in:

allCons xs = sufficient ==> length constructors == consCount
             where sufficient   = length xs > 100 * consCount
                   constructors = length . nub . map toConstr $ xs
                   consCount    = length . dataTypeConstrs $ case head xs of
                                                                  x@(C1)     -> x
                                                                  x@(C2 _)   -> x
                                                                  x@(C3 _ _) -> x

Notice that we use consCount to eliminate the magic 3 completely. The magic 100 (which determined the minimum required frequency of a constructor) now scales with consCount, but that just requires even more test data!

We can solve that quite easily using a newtype:

consCount = length (dataTypeConstrs C1)

newtype MyTypeList = MTL [MyType] deriving (Eq,Show)

instance Arbitrary MyTypeList where
  arbitrary = MTL <$> vectorOf (100 * consCount) arbitrary
  shrink (MTL xs) = MTL (shrink <$> xs)

allCons (MTL xs) = length constructors == consCount
                   where constructors = length . nub . map toConstr $ xs

We can put a simple exhaustiveness check in there somewhere if we like, eg.

instance Arbitrary MyTypeList where
  arbitrary = do x <- arbitrary
                 MTL <$> vectorOf (100 * consCount) getT
              where getT = do x <- arbitrary
                              return $ case x of
                                            C1     -> x
                                            C2 _   -> x
                                            C3 _ _ -> x
  shrink (MTL xs) = MTL (shrink <$> xs)
like image 21
Warbo Avatar answered Nov 17 '22 03:11

Warbo


Here I exploit an unused variable _x. This is not really more elegant than your solution, though.

instance Arbitrary Mytype where
  arbitrary = do
    let _x = case _x of C1 -> _x ; C2 _ -> _x ; C3 _ _ -> _x
    n <- choose (1, 3 :: Int)
    case n of
      1 -> C1
      2 -> C2 <$> arbitrary
      3 -> C3 <$> arbitrary <*> someCustomGen

Of course, one has to keep the last case coherent with the dummy definition of _x, so it is not completely DRY.

Alternatively, one might exploit Template Haskell to build a compile-time assert checking that the constructors in Data.Data.dataTypeOf are the expected ones. This assert has to be kept coherent with the Arbitrary instance, so this is not completely DRY either.

If you do not need custom generators, I believe Data.Data can be exploited to generate Arbitrary instances via Template Haskell (I think I saw some code doing exactly that, but I can't remember where). In this way, there's no chance the instance can miss a constructor.

like image 1
chi Avatar answered Nov 17 '22 03:11

chi