I have a Haskell data type like
data Mytype
= C1
| C2 Char
| C3 Int String
If I case
on a Mytype
and forget to handle one of the cases, GHC gives me a warning (exhaustiveness check).
I now want to write a QuickCheck Arbitrary
instance to generate MyTypes
like:
instance Arbitrary Mytype where
arbitrary = do
n <- choose (1, 3 :: Int)
case n of
1 -> C1
2 -> C2 <$> arbitrary
3 -> C3 <$> arbitrary <*> someCustomGen
The problem with this is that I can add a new alternative to Mytype
and forget to update the Arbitrary instance, thus having my tests not test that alternative.
I would like to find a way of using GHC's exhaustiveness checker to remind me of forgotten cases in my Arbitrary instance.
The best I've come up with is
arbitrary = do
x <- elements [C1, C2 undefined, C3 undefined undefined]
case x of
C1 -> C1
C2 _ -> C2 <$> arbitrary
C3 _ _ -> C3 <$> arbitrary <*> someCustomGen
But it doesn't really feel elegant.
I intuitively feel that there's no 100% clean solution to this, but would appreciate anything that reduces the chance of forgetting such cases - especially in a big project where code and tests are separated.
I implemented a solution with TemplateHaskell, you can find a prototype at https://gist.github.com/nh2/d982e2ca4280a03364a8. With this you can write:
instance Arbitrary Mytype where
arbitrary = oneof $(exhaustivenessCheck ''Mytype [|
[ pure C1
, C2 <$> arbitrary
, C3 <$> arbitrary <*> arbitrary
]
|])
It works like this: You give it a type name (like ''Mytype
) and an expression (in my case a list of arbitrary
style Gen
s). It gets the list of all constructors for that type name and checks whether the expression contains all of these constructors at least once. If you just added a constructor but forgot to add it to the Arbitrary instance, this function will warn you at compile time.
This is how it's implemented with TH:
exhaustivenessCheck :: Name -> Q Exp -> Q Exp
exhaustivenessCheck tyName qList = do
tyInfo <- reify tyName
let conNames = case tyInfo of
TyConI (DataD _cxt _name _tyVarBndrs cons _derives) -> map conNameOf cons
_ -> fail "exhaustivenessCheck: Can only handle simple data declarations"
list <- qList
case list of
input@(ListE l) -> do
-- We could be more specific by searching for `ConE`s in `l`
let cons = toListOf tinplate l :: [Name]
case filter (`notElem` cons) conNames of
[] -> return input
missings -> fail $ "exhaustivenessCheck: missing case: " ++ show missings
_ -> fail "exhaustivenessCheck: argument must be a list"
I'm using GHC.Generics
to easily traverse the syntax tree of the Exp
: With toListOf tinplate exp :: [Name]
(from lens
) I can easily find all Name
s in the whole exp
.
I was surprised that the types from Language.Haskell.TH
do not have Generic
instances, and neither (with current GHC 7.8) do Integer
or Word8
- Generic
instances for these are required because they appear in Exp
. So I added them as orphan instances (for most things, StandaloneDeriving
does it but for primitive types like Integer
I had to copy-paste instances as Int
has them).
The solution is not perfect because it doesn't use the exhaustiveness checker like case
does, but as we agree, that's not possible while staying DRY, and this TH solution is DRY.
One possible improvement/alternative would be to write a TH function that does this check for all Arbitrary instances in a whole module at once instead of calling exhaustivenessCheck
inside each Arbitrary instance.
You want to ensure that your code behaves in a particular way; the simplest way to check the behaviour of code is to test it.
In this case, the desired behaviour is that each constructor gets reasonable coverage in tests. We can check that with a simple test:
allCons xs = length xs > 100 ==> length constructors == 3
where constructors = nubBy eqCons xs
eqCons C1 C1 = True
eqCons C1 _ = False
eqCons (C2 _) (C2 _) = True
eqCons (C2 _) _ = False
eqCons (C3 _ _) (C3 _ _) = True
eqCons (C3 _ _) _ = False
This is pretty naive, but it's a good first shot. Its advantages:
eqCons
will trigger an exhaustiveness warning if new constructors are added, which is what you wantIts disadvantages:
eqCons
is quite verbose, since a catch-all eqCons _ _ = False
would bypass the exhaustiveness checkThere are ways to improve this, eg. we can compute the constructors using the Data.Data module:
allCons xs = sufficient ==> length constructors == consCount
where sufficient = length xs > 100 * consCount
constructors = length . nub . map toConstr $ xs
consCount = dataTypeConstrs (head xs)
This loses the compile-time exhaustiveness check, but it's redundant as long as we test regularly and our code has become more generic.
If we really want the exhaustiveness check, there are a few places where we could shoe-horn it back in:
allCons xs = sufficient ==> length constructors == consCount
where sufficient = length xs > 100 * consCount
constructors = length . nub . map toConstr $ xs
consCount = length . dataTypeConstrs $ case head xs of
x@(C1) -> x
x@(C2 _) -> x
x@(C3 _ _) -> x
Notice that we use consCount to eliminate the magic 3
completely. The magic 100
(which determined the minimum required frequency of a constructor) now scales with consCount, but that just requires even more test data!
We can solve that quite easily using a newtype:
consCount = length (dataTypeConstrs C1)
newtype MyTypeList = MTL [MyType] deriving (Eq,Show)
instance Arbitrary MyTypeList where
arbitrary = MTL <$> vectorOf (100 * consCount) arbitrary
shrink (MTL xs) = MTL (shrink <$> xs)
allCons (MTL xs) = length constructors == consCount
where constructors = length . nub . map toConstr $ xs
We can put a simple exhaustiveness check in there somewhere if we like, eg.
instance Arbitrary MyTypeList where
arbitrary = do x <- arbitrary
MTL <$> vectorOf (100 * consCount) getT
where getT = do x <- arbitrary
return $ case x of
C1 -> x
C2 _ -> x
C3 _ _ -> x
shrink (MTL xs) = MTL (shrink <$> xs)
Here I exploit an unused variable _x
. This is not really more elegant than your solution, though.
instance Arbitrary Mytype where
arbitrary = do
let _x = case _x of C1 -> _x ; C2 _ -> _x ; C3 _ _ -> _x
n <- choose (1, 3 :: Int)
case n of
1 -> C1
2 -> C2 <$> arbitrary
3 -> C3 <$> arbitrary <*> someCustomGen
Of course, one has to keep the last case
coherent with the dummy definition of _x
, so it is not completely DRY.
Alternatively, one might exploit Template Haskell to build a compile-time assert checking that the constructors in Data.Data.dataTypeOf
are the expected ones. This assert has to be kept coherent with the Arbitrary
instance, so this is not completely DRY either.
If you do not need custom generators, I believe Data.Data
can be exploited to generate Arbitrary
instances via Template Haskell (I think I saw some code doing exactly that, but I can't remember where). In this way, there's no chance the instance can miss a constructor.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With