I'm attempting to construct an abstract syntax tree that allows definition using the monad do
notation like so:
ast = do
Variable uint8 "i"
Function Void "f" $ do
Variable uint8 "local_y"
Comment "etc. etc."
The construction I show here was gleaned from Text.Blaze.Html, where it is used to define an HTML tree.
Questions are scattered throughout the following. The main question is how to do this correctly. Any input that helps with understanding this construct is greatly appreciated, of course.
So, first of all, here's a small, flawed, but 'working' example. It's a syntax tree with declarations of variables and functions of certain type, comment lines, and a placeholder declaration that is used for substitutions:
{-# LANGUAGE ExistentialQuantification #-}
module Question
where
import Control.Applicative
import Data.Monoid (Monoid, (<>))
import Data.String.Utils (rstrip)
type NumberOfBits = Word
type VariableName = String
data Type = UInt NumberOfBits
| Int NumberOfBits
| Void
uint8 = UInt 8
int8 = Int 8
instance Show Type where
show (UInt w) = "uint" <> show w
show (Int w) = "int" <> show w
show Void = "void"
data TreeM a = Variable Type VariableName -- variable declaration
| Function Type VariableName (TreeM a) -- function declaration
| Comment String -- a comment
| PlaceHolder String -- a placeholder with
| forall b. Append (TreeM b) (TreeM a) -- combiner
| Empty a -- needed for what?
type Tree = TreeM ()
subTreeOf :: TreeM a -> a
subTreeOf (Variable _ _) = undefined
subTreeOf (Function _ _ t) = subTreeOf t
subTreeOf (Comment _) = undefined
subTreeOf (Empty t) = t
instance Monoid a => Monoid (TreeM a) where
mempty = Empty mempty
mappend = Append
mconcat = foldr Append mempty
instance Functor TreeM where
fmap f x = x `Append` (Empty (f (subTreeOf x))) -- fmap :: (a -> b) -> f a -> f b
instance Applicative TreeM where
pure x = Empty x
(<*>) x y = (x `Append` y) `Append` (Empty (subTreeOf x (subTreeOf y))) -- (<*>) :: f (a -> b) -> f a -> f b
(*>) = Append
instance Monad TreeM where
return x = Empty x
(>>) = Append -- not really needed: (>>) would default to (*>)
t >>= f = t `Append` (f (subTreeOf t))
indent :: String -> String
indent s = rstrip $ unlines $ map (" "<>) (lines s)
render :: TreeM a -> String
render (Variable y n) = "Variable " <> (show y) <> " " <> show n
render (Function r n t) = "Function" <> " " <> n <> " returning " <> (show r) <> ":\n" <> indent (render t)
render (PlaceHolder n) = "Placeholder \"" <> n <> "\""
render (Append t t') = (render t) <> "\n" <> (render t')
render (Empty _) = ""
-- |In input tree t substitute a PlaceHolder of name n' with the Tree t'
sub :: TreeM a -> (String, TreeM a) -> TreeM a
sub t@(PlaceHolder n) (n', t') = if n == n' then t' else t
sub (Function y n t) s = Function y n (sub t s)
--sub (Append t t') s = Append (sub t s) (sub t' s) -- Error!
sub t _ = t
code :: Tree
code = do
Variable uint8 "i"
Variable int8 "j"
Function Void "f" $ do
Comment "my function f"
Variable int8 "i1"
Variable int8 "i2"
PlaceHolder "the_rest"
main :: IO ()
main = do
putStrLn $ render code
putStrLn "\nNow apply substitution:\n"
putStrLn $ render (sub code ("the_rest", Comment "There is nothing here"))
This is (should be) a really neat way to define complex tree structures. In particular, this should be the syntactically least noisy, user-friendly, way to define a syntax tree.
In general I struggle to understand the exact meaning of the a
in TreeM a
. The way I see it the a
can be any of the types Variable
, Function
, PlaceHolder
, etc.
I note a few things that strike me as odd:
forall b. Append (TreeM b) (TreeM a)
the order of TreeM a
and TreeM b
arguments to Append
seem to be reversed. In any case, the use of a existential quantifier in a sum type looks strange. If I understand this correctly, it defines a family of constructors for TreeM
.Functor
, Applicative
and Monad
the only one actually used is the monad >>
. (This indicates that a free monad might be the right tool for this job.) It actually never occurred to me that the do notation employs the >>
operator and that this fact can be used. undefined
has to be used in subTreeOf
in order to make the function total.As already noted, the example above is flawed: Parts of the construct are not appropriate for ASTs:
Empty
makes sense for HTML trees, it is used for empty tags like <br />
. But for the AST it makes no sense. It was left as is to keep the Applicative
and Functor
implementations working.Functor
and Applicative
may make sense for HTML trees, but not for the AST. Even for HTML I don't quite understand the purpose of the fmap
and applicative <*>
. Both extend the tree by pushing down a node and adding an Empty
type. I don't quite see which natural transformation on HTML trees that represents.I'm surprised the subTreeOf x (subTreeOf y)
in the definition of the applicative <*>
is actually correct syntax, or is there an implicit >>
?
It is natural to apply transformations on ASTs. The PlaceHolder
serves as a little toy for applying a transformation. The function sub
, having only a partial implementation, here, should substitute the placeholder "the_rest" with a comment. The necessary
sub (Append t t') s = Append (sub t s) (sub t' s)
does not compile however, the expected type of s
is (String, TreeM b)
, the actual type is (String, TreeM a)
.
Changing the type to
sub :: TreeM a -> (String, TreeM b) -> TreeM a
, on the other hand, breaks the definition of sub p@(PlaceHolder n)
and now I'm stuck.
In fact, isn't this sub
exactly what fmap
for ASTs should be?
The term 'free monad' regularly pops up when monads for ASTs are discussed. But the free monad relies on the Functor
fmap
for the free construction and the fmap
shown here is not adequate for ASTs. Once the correct fmap
is identified the free monad should do the rest - maybe.
fmap
It seems that a correct fmap
is the key for success here, and the correct <*>
will probably become more obvious.
Loops can be written with forM_
, a nice way to build up repetitive parts of ASTs:
forM_ ["you", "get", "the", "idea"] $ \varName -> do
Variable uint8 varName
Conditional parts can use when
, unless
, etc.
when hasCppDestructor $ do
Comment "We need the destructor"
Function NoReturnType "~SomeClass" $ do
...
Semantic analysis, e.g. ensuring correct declaration order, is also possible as was pointed out in the first answer.
Visual clues: Another thing I like is that in the construct shown above control structures such as if-then-else, forM_
etc. start lower-case whereas the AST lines start upper-case.
A few words about where this is heading, possibly: The idea is to use a nice enough embedded DSL that allows to automatically define an AST that rather abstractly represents, say, a convoluted FSM that needs to be implemented in C, C++, Python, Java, Go, Rust, Javascript, whatever... A render
function like the one above then maps the certifiably correct AST to the target language.
>>
does not default to *>
, but to m >> k = m >>= (\_ -> k)
!I'm not sure if this whole approach is a good idea (though, I've actually though a number of times of doing something similar myself).
Note that monads like Blaze.MarkupM
, HaTeX.LaTeXM
etc. aren't really much of a monad. They're really just monoids that want access to monadic combinators (mostly, to abuse do
notation, but it also allows stack monad transformers on top, which can make quite some sense). I.e., they're nothing but specialised Writer
monads!
At the moment, you're really doing the same thing; if that's what you're intending then maybe the best way to go about it would be to just design your type as a Monoid Tree
, then look at the structure of the Writer Tree
monad and, if desired, refactor it into a TreeM
data structure. (HaTeX doesn't do this but keeps LaTeX
and LaTeXM
separate types with only a common class interface, which is arguably a cleaner approach, though it may be not optimal for performance.)
The result will be pretty like Blaze.MarkupM
/ the structure you have right now. I could discuss your individual questions, but really, they can all be answered by just looking at how the type is isomorphic to the writer monad.
Actually you don't need a Monad
instance at all to use do
, as such:
Prelude> 2 * do 1 + 1
4
So if you just want to abuse do
to avoid parentheses in a tree layout, but don't really have a sensible way of stashing bindable variables in your structure, consider not writing any monad instance. That instance is only needed for a do
block with multiple lines, but if none of these lines bind any variables then you can always just replace the implicit >>
with an explicit <>
, like
Function Void "f" $ do
Variable uint8 "local_y"
<> Comment "etc. etc."
The only problem really is: these lines can't include the $
operator, because that has lower precedence than <>
. One nifty way to circumvent this is to observe that ($) = id
, so you can write your example as
ast = do
Variable uint8 "i"
<> Function Void "f" `id`do
Variable uint8 "local_y"
<> Comment "etc. etc."
Whether this is even more abuse of syntax than defining a not-much-of-a-monad instance is debatable. IMO, if you define such a monad instance you should right away make it a monad transformer, like HaTeX
does, because that also gives the option to allow including IO
actions in your AST build (for example, to hard-include external source files).
All that said: for your application, it might actually make sense to have a Monad
instance that's not just a “sugared up monoid” but actually binds, well, variables in a useful way. That's a feature not applicable to blaze
, but certainly to C++/Python/JavaScript language like ASTs, and it could be quite useful because it ensures variables are defined before use, right within the Haskell syntax. Instead of your example, you'd write
ast = do
i <- variable uint8
Function Void "f" $ do
local_y <- variable uint8
Comment "etc. etc."
The variables would under the hood then actually just be numbered identifiers, chosen according to a state variable.
Implementation would be roughly like this:
type VariableName = Int
data TreeS = Variable Type VariableName
| Function Type VariableName TreeS
| Comment String
| PlaceHolder String
| Append TreeS TreeS
| Empty
instance Monoid where (<>) = Append
newtype TreeT m a
= TreeT { runTreeM :: StateT VariableName (WriterT TreeS m) a }
deriving (Functor, Applicative, Monad)
variable :: Type -> TreeT m VariableName
variable typ = TreeT $ do
i <- get
lift . tell $ Variable typ i
put $ i+1
return i
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With