Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use of Haskell state monad a code smell?

God I hate the term "code smell", but I can't think of anything more accurate.

I'm designing a high-level language & compiler to Whitespace in my spare time to learn about compiler construction, language design, and functional programming (compiler is being written in Haskell).

During the code generation phase of the compiler, I have to maintain "state"-ish data as I traverse the syntax tree. For example, when compiling flow-control statements I need to generate unique names for the labels to jump to (labels generated from a counter that's passed in, updated, & returned, and the old value of the counter must never be used again). Another example is when I come across in-line string literals in the syntax tree, they need to be permanently converted into heap variables (in Whitespace, strings are best stored on the heap). I'm currently wrapping the entire code generation module in the state monad to handle this.

I've been told that writing a compiler is a problem well suited to the functional paradigm, but I find that I'm designing this in much the same way I would design it in C (you really can write C in any language - even Haskell w/ state monads).

I want to learn how to think in Haskell (rather, in the functional paradigm) - not in C with Haskell syntax. Should I really try to eliminate/minimize use of the state monad, or is it a legitimate functional "design pattern"?

like image 757
Cybis Avatar asked Mar 03 '09 19:03

Cybis


People also ask

What is Haskell monad?

In Haskell a monad is represented as a type constructor (call it m ), a function that builds values of that type ( a -> m a ), and a function that combines values of that type with computations that produce values of that type to produce a new computation for values of that type ( m a -> (a -> m b) -> m b ).

What is state monad?

The state monad is a built in monad in Haskell that allows for chaining of a state variable (which may be arbitrarily complex) through a series of function calls, to simulate stateful code. It is defined as: newtype State s a = State { runState :: (s -> (a,s)) }

How does state work in Haskell?

The Haskell type State describes functions that consume a state and produce both a result and an updated state, which are given back in a tuple. Here, s is the type of the state, and a the type of the produced result.

How does Haskell deal with side effects?

Haskell is a pure language Moreover, Haskell functions can't have side effects, which means that they can't effect any changes to the "real world", like changing files, writing to the screen, printing, sending data over the network, and so on.


2 Answers

I've written multiple compilers in Haskell, and a state monad is a reasonable solution to many compiler problems. But you want to keep it abstract---don't make it obvious you're using a monad.

Here's an example from the Glasgow Haskell Compiler (which I did not write; I just work around a few edges), where we build control-flow graphs. Here are the basic ways to make graphs:

empyGraph    :: Graph mkLabel      :: Label -> Graph mkAssignment :: Assignment -> Graph  -- modify a register or memory mkTransfer   :: ControlTransfer -> Graph   -- any control transfer (<*>)        :: Graph -> Graph -> Graph 

But as you've discovered, maintaining a supply of unique labels is tedious at best, so we provide these functions as well:

withFreshLabel :: (Label -> Graph) -> Graph mkIfThenElse :: (Label -> Label -> Graph) -- branch condition              -> Graph   -- code in the 'then' branch              -> Graph   -- code in the 'else' branch               -> Graph   -- resulting if-then-else construct 

The whole Graph thing is an abstract type, and the translator just merrily constructs graphs in purely functional fashion, without being aware that anything monadic is going on. Then, when the graph is finally constructed, in order to turn it into an algebraic datatype we can generate code from, we give it a supply of unique labels, run the state monad, and pull out the data structure.

The state monad is hidden underneath; although it's not exposed to the client, the definition of Graph is something like this:

type Graph = RealGraph -> [Label] -> (RealGraph, [Label]) 

or a bit more accurately

type Graph = RealGraph -> State [Label] RealGraph   -- a Graph is a monadic function from a successor RealGraph to a new RealGraph 

With the state monad hidden behind a layer of abstraction, it's not smelly at all!

like image 165
Norman Ramsey Avatar answered Sep 20 '22 18:09

Norman Ramsey


I'd say that state in general is not a code smell, so long as it's kept small and well controlled.

This means that using monads such as State, ST or custom-built ones, or just having a data structure containing state data that you pass around to a few places, is not a bad thing. (Actually, monads are just assistance in doing exactly this!) However, having state that goes all over the place (yes, this means you, IO monad!) is a bad smell.

An fairly clear example of this was when my team was working on our entry for the ICFP Programming Contest 2009 (the code is available at git://git.cynic.net/haskell/icfp-contest-2009). We ended up with several different modular parts to this:

  • VM: the virtual machine that ran the simulation program
  • Controllers: several different sets of routines that read the output of the simulator and generated new control inputs
  • Solution: generation of the solution file based on the output of the controllers
  • Visualizers: several different sets of routines that read both the input and output ports and generated some sort of visualization or log of what was going on as the simulation progressed

Each of these has its own state, and they all interact in various ways through the input and output values of the VM. We had several different controllers and visualizers, each of which had its own different kind of state.

The key point here was that the the internals of any particular state were limited to their own particular modules, and each module knew nothing about even the existence of state for other modules. Any particular set of stateful code and data was generally only a few dozen lines long, with a handful of data items in the state.

All this was glued together in one small function of about a dozen lines which had no access to the internals of any of the states, and which merely called the right things in the proper order as it looped through the simulation, and passed a very limited amount of outside information to each module (along with the module's previous state, of course).

When state is used in such a limited way, and the type system is preventing you from inadvertently modifying it, it's quite easy to handle. It's one of the beauties of Haskell that it lets you do this.

One answer says, "Don't use monads." From my point of view, this is exactly backwards. Monads are a control structure that, among other things, can help you minimize the amount of code that touches state. If you look at monadic parsers as an example, the state of the parse (i.e., the text being parsed, how far one has gotten in to it, any warnings that have accumulated, etc.) must run through every combinator used in the parser. Yet there will only be a few combinators that actually manipulate the state directly; anything else uses one of these few functions. This allows you to see clearly and in one place all of a small amount of code that can change the state, and more easily reason about how it can be changed, again making it easier to deal with.

like image 29
cjs Avatar answered Sep 21 '22 18:09

cjs