Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell code littered with TVar operations and functions taking many arguments: code smell?

Tags:

haskell

stm

tvar

I'm writing a MUD server in Haskell (MUD = Multi User Dungeon: basically, a multi-user text adventure/role-playing game). The game world data/state is represented in about 15 different IntMaps. My monad transformer stack looks like this: ReaderT MudData IO, where the MudData type is a record type containing the IntMaps, each in its own TVar (I'm using STM for concurrency):

data MudData = MudData { _armorTblTVar    :: TVar (IntMap Armor)
                       , _clothingTblTVar :: TVar (IntMap Clothing)
                       , _coinsTblTVar    :: TVar (IntMap Coins)

...and so on. (I'm using lenses, thus the underscores.)

Some functions need certain IntMaps, while other functions need others. Thus, having each IntMap in its own TVar provides granularity.

However, a pattern has emerged in my code. In the functions that handle player commands, I need to read (and sometimes later write) to my TVars within the STM monad. Thus these functions end up having an STM helper defined in their where blocks. These STM helpers often have quite a few readTVar operations in them, as most commands need to access a handful of the IntMaps. Furthermore, a function for a given command may call out to a number of pure helper functions that also need some or all of the IntMaps. These pure helper functions thus sometimes end up taking a lot of arguments (sometimes over 10).

So, my code has become "littered" with lots of readTVar expressions and functions that take a large number of arguments. Here are my questions: is this a code smell? Am I missing some abstraction that would make my code more elegant? Is there a more ideal way to structure my data/code?

Thanks!

like image 742
the-konapie Avatar asked Mar 07 '15 17:03

the-konapie


2 Answers

The solution to this problem is in changing the pure helper functions. We don't really want them to be pure, we want to leak out a single side-effect - whether or not they read specific pieces of data.

Let's say we have a pure function that uses only clothing and coins:

moreVanityThanWealth :: IntMap Clothing -> IntMap Coins -> Bool
moreVanityThanWealth clothing coins = ...

It's usually nice to know that a function only cares about e.g. clothing and coins, but in your case this knowledge is irrelevant and is just creating headaches. We are going to deliberately forget this detail. If we followed mb14's suggestion, we would pass an entire pure MudData' like the following to the helper functions.

data MudData' = MudData' { _armorTbl    :: IntMap Armor
                         , _clothingTbl :: IntMap Clothing
                         , _coinsTbl    :: IntMap Coins

moreVanityThanWealth :: MudData' -> Bool
moreVanityThanWealth md =
    let clothing = _clothingTbl md
        coins    = _coinsTbl    md
    in  ...

MudData and MudData' are almost identical to each other. One of them wraps its fields in TVars and the other one doesn't. We can modify MudData so that it takes an extra type parameter (of kind * -> *) for what to wrap the fields in. MudData will have the slightly unusual kind (* -> *) -> *, which is closely related to lenses but doesn't have much library support. I call this pattern a Model.

data MudData f = MudData { _armorTbl    :: f (IntMap Armor)
                         , _clothingTbl :: f (IntMap Clothing)
                         , _coinsTbl    :: f (IntMap Coins)

We can recover the original MudData with MudData TVar. We can recreate the pure version by wrapping the fields in Identity, newtype Identity a = Identity {runIdentity :: a}. In terms of MudData Identity, our function would be written as

moreVanityThanWealth :: MudData Identity -> Bool
moreVanityThanWealth md =
    let clothing = runIdentity . _clothingTbl $ md
        coins    = runIdentity . _coinsTbl    $ md
    in  ...

We've successfully forgotten which parts of the MudData we've used, but now we don't have the lock granularity we want. We need to recover, as a side effect, exactly what we just forgot. If we wrote the STM version of the helper it would look like

moreVanityThanWealth :: MudData TVar -> STM Bool
moreVanityThanWealth md =
    do
        clothing <- readTVar . _clothingTbl $ md
        coins    <- readTVar . _coinsTbl    $ md
        return ...

This STM version for MudData TVar is almost exactly the same as the pure version we just wrote for MudData Identity. They only differ by the type of the reference (TVar vs. Identity), what function we use to get the values out of the references (readTVar vs runIdentity), and how the result is returned (in STM or as a plain value). It would be nice if the same function could be used to provide both. We are going to extract what is common between the two functions. To do so, we'll introduce a type class MonadReadRef r m for the Monads we can read some type of reference from. r is the type of the reference, readRef is the function to get the values out of the references, and m is how the result is returned. The following MonadReadRef is closely related to the MonadRef class from ref-fd.

{-# LANGUAGE FunctionalDependencies #-}

class Monad m => MonadReadRef r m | m -> r where
    readRef :: r a -> m a

As long as code is parameterized over all MonadReadRef r ms, it is pure. We can see this by running it with the following instance of MonadReadRef for ordinary values held in an Identity. The id in readRef = id is the same as return . runIdentity.

instance MonadReadRef Identity Identity where
    readRef = id

We'll rewrite moreVanityThanWealth in terms of MonadReadRef.

moreVanityThanWealth :: MonadReadRef r m => MudData r -> m Bool
moreVanityThanWealth md =
    do
        clothing <- readRef . _clothingTbl $ md
        coins    <- readRef . _coinsTbl    $ md
        return ...

When we add a MonadReadRef instance for TVars in STM, we can use these "pure" computations in STM but leak the side-effect of which TVars were read.

instance MonadReadRef TVar STM where
    readRef = readTVar
like image 103
Cirdec Avatar answered Nov 11 '22 12:11

Cirdec


Yes, this obviously makes your code complex and clutters the important code with a lot of boilerplate details. And functions with more than 4 arguments are a sign of problems.

I'd ask the question: Do you really gain anything by having separate TVars? Isn't it a case of premature optimization? Before taking such a design decision as splitting your data structure among multiple separate TVars, I'd definitely do some measurements (see criterion). You can create a sample test that models the expected number of concurrent threads and frequency of data updates and check what are you really gaining or losing by having multiple TVars vs a single one vs an IORef.

Keep in mind:

  • If there are multiple threads competing for common locks in a STM transaction, the transactions can get restarted several times before they manage to successfully complete. So under some circumstances, having multiple locks can actually make things worse.
  • If there is ultimately just one data structure that you need to synchronize, you might consider using a single IORef instead. It's atomic operations are very fast, which could compensate for having a single central lock.
  • In Haskell it's surprisingly difficult for a pure function to block an atomic STM or a IORef transaction for a long time. The reason is laziness: You only need to create thunks within such a transaction, not to evaluate them. This is true in particular for a single atomic IORef. The thunks are evaluated outside such transactions (by a thread that inspects them, or you can decide to force them at some point, if you need more control; this can be desired in your case, as if your system evolves without anybody observing it, you can easily accumulate unevaluated thunks).

If it turns out that having multiple TVars is indeed crucial, then I'd probably write all the code in a custom monad (as described by @Cirdec while I was writing my answer), whose implementation would be hidden from the main code, and which would provide functions for reading (and perhaps also writing) parts of the state. It'd then be run as a single STM transaction, reading and writing only what's needed, and you could have a pure version of the monad for testing.

like image 45
Petr Avatar answered Nov 11 '22 12:11

Petr