I'm writing a MUD server in Haskell (MUD = Multi User Dungeon: basically, a multi-user text adventure/role-playing game). The game world data/state is represented in about 15 different IntMap
s. My monad transformer stack looks like this: ReaderT MudData IO
, where the MudData
type is a record type containing the IntMap
s, each in its own TVar
(I'm using STM for concurrency):
data MudData = MudData { _armorTblTVar :: TVar (IntMap Armor)
, _clothingTblTVar :: TVar (IntMap Clothing)
, _coinsTblTVar :: TVar (IntMap Coins)
...and so on. (I'm using lenses, thus the underscores.)
Some functions need certain IntMap
s, while other functions need others. Thus, having each IntMap
in its own TVar
provides granularity.
However, a pattern has emerged in my code. In the functions that handle player commands, I need to read (and sometimes later write) to my TVar
s within the STM monad. Thus these functions end up having an STM helper defined in their where
blocks. These STM helpers often have quite a few readTVar
operations in them, as most commands need to access a handful of the IntMap
s. Furthermore, a function for a given command may call out to a number of pure helper functions that also need some or all of the IntMap
s. These pure helper functions thus sometimes end up taking a lot of arguments (sometimes over 10).
So, my code has become "littered" with lots of readTVar
expressions and functions that take a large number of arguments. Here are my questions: is this a code smell? Am I missing some abstraction that would make my code more elegant? Is there a more ideal way to structure my data/code?
Thanks!
The solution to this problem is in changing the pure helper functions. We don't really want them to be pure, we want to leak out a single side-effect - whether or not they read specific pieces of data.
Let's say we have a pure function that uses only clothing and coins:
moreVanityThanWealth :: IntMap Clothing -> IntMap Coins -> Bool
moreVanityThanWealth clothing coins = ...
It's usually nice to know that a function only cares about e.g. clothing and coins, but in your case this knowledge is irrelevant and is just creating headaches. We are going to deliberately forget this detail. If we followed mb14's suggestion, we would pass an entire pure MudData'
like the following to the helper functions.
data MudData' = MudData' { _armorTbl :: IntMap Armor
, _clothingTbl :: IntMap Clothing
, _coinsTbl :: IntMap Coins
moreVanityThanWealth :: MudData' -> Bool
moreVanityThanWealth md =
let clothing = _clothingTbl md
coins = _coinsTbl md
in ...
MudData
and MudData'
are almost identical to each other. One of them wraps its fields in TVar
s and the other one doesn't. We can modify MudData
so that it takes an extra type parameter (of kind * -> *
) for what to wrap the fields in. MudData
will have the slightly unusual kind (* -> *) -> *
, which is closely related to lenses but doesn't have much library support. I call this pattern a Model.
data MudData f = MudData { _armorTbl :: f (IntMap Armor)
, _clothingTbl :: f (IntMap Clothing)
, _coinsTbl :: f (IntMap Coins)
We can recover the original MudData
with MudData TVar
. We can recreate the pure version by wrapping the fields in Identity
, newtype Identity a = Identity {runIdentity :: a}
. In terms of MudData Identity
, our function would be written as
moreVanityThanWealth :: MudData Identity -> Bool
moreVanityThanWealth md =
let clothing = runIdentity . _clothingTbl $ md
coins = runIdentity . _coinsTbl $ md
in ...
We've successfully forgotten which parts of the MudData
we've used, but now we don't have the lock granularity we want. We need to recover, as a side effect, exactly what we just forgot. If we wrote the STM
version of the helper it would look like
moreVanityThanWealth :: MudData TVar -> STM Bool
moreVanityThanWealth md =
do
clothing <- readTVar . _clothingTbl $ md
coins <- readTVar . _coinsTbl $ md
return ...
This STM
version for MudData TVar
is almost exactly the same as the pure version we just wrote for MudData Identity
. They only differ by the type of the reference (TVar
vs. Identity
), what function we use to get the values out of the references (readTVar
vs runIdentity
), and how the result is returned (in STM
or as a plain value). It would be nice if the same function could be used to provide both. We are going to extract what is common between the two functions. To do so, we'll introduce a type class MonadReadRef r m
for the Monad
s we can read some type of reference from. r
is the type of the reference, readRef
is the function to get the values out of the references, and m
is how the result is returned. The following MonadReadRef
is closely related to the MonadRef
class from ref-fd.
{-# LANGUAGE FunctionalDependencies #-}
class Monad m => MonadReadRef r m | m -> r where
readRef :: r a -> m a
As long as code is parameterized over all MonadReadRef r m
s, it is pure. We can see this by running it with the following instance of MonadReadRef
for ordinary values held in an Identity
. The id
in readRef = id
is the same as return . runIdentity
.
instance MonadReadRef Identity Identity where
readRef = id
We'll rewrite moreVanityThanWealth
in terms of MonadReadRef
.
moreVanityThanWealth :: MonadReadRef r m => MudData r -> m Bool
moreVanityThanWealth md =
do
clothing <- readRef . _clothingTbl $ md
coins <- readRef . _coinsTbl $ md
return ...
When we add a MonadReadRef
instance for TVar
s in STM
, we can use these "pure" computations in STM
but leak the side-effect of which TVar
s were read.
instance MonadReadRef TVar STM where
readRef = readTVar
Yes, this obviously makes your code complex and clutters the important code with a lot of boilerplate details. And functions with more than 4 arguments are a sign of problems.
I'd ask the question: Do you really gain anything by having separate TVar
s? Isn't it a case of premature optimization? Before taking such a design decision as splitting your data structure among multiple separate TVar
s, I'd definitely do some measurements (see criterion). You can create a sample test that models the expected number of concurrent threads and frequency of data updates and check what are you really gaining or losing by having multiple TVar
s vs a single one vs an IORef
.
Keep in mind:
STM
transaction, the transactions can get restarted several times before they manage to successfully complete. So under some circumstances, having multiple locks can actually make things worse.IORef
instead. It's atomic operations are very fast, which could compensate for having a single central lock.STM
or a IORef
transaction for a long time. The reason is laziness: You only need to create thunks within such a transaction, not to evaluate them. This is true in particular for a single atomic IORef
. The thunks are evaluated outside such transactions (by a thread that inspects them, or you can decide to force them at some point, if you need more control; this can be desired in your case, as if your system evolves without anybody observing it, you can easily accumulate unevaluated thunks).If it turns out that having multiple TVar
s is indeed crucial, then I'd probably write all the code in a custom monad (as described by @Cirdec while I was writing my answer), whose implementation would be hidden from the main code, and which would provide functions for reading (and perhaps also writing) parts of the state. It'd then be run as a single STM
transaction, reading and writing only what's needed, and you could have a pure version of the monad for testing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With