How to build Abstract Syntax Trees from grammar specification in Haskell?

Tags:

I'm working on a project which involves optimizing certain constructs in a very small subset of Java, formalized in BNF.

If I were to do this in Java, I would use a combination of JTB and JavaCC which builds an AST. Visitors are then used to manipulate the tree. But, given the vast libraries for parsing in Haskell (parsec, happy, alex etc), I'm a bit confused in chossing the appropriate library.

So, simply put, when a language is specified in BNF, which library offers the easiest means to build an AST? And what is the best way to go about modifying this tree in idiomatic Haskell?

588

asked Sep 10 '13 11:09

Vamshi Surabhi

2 Answers

Well in Haskell there are 2 main ways of parsing something, parse combinators or a parser generator. Since you already have a BNF I'd suggest the latter.

A good one is alex. GHC's parser IIRC is written using this so you'd be in good company.

Next you'll have a big honking stack of data declarations to parse into:

data JavaClass = {
    className :: Name,
    interfaces :: [Name],
    contents :: [ClassContents],
    ...
 }
  data ClassContents = M Method
                     | F Field
                     | IC InnerClass

and for expressions and whatever else you need. Finally you'll combine these into something like

data TopLevel = JC JavaClass
              | WhateverOtherForms
              | YouWillParse

Once you have this you'll have the entire AST represented as one TopLevel or a list of them depending on how many you classes/files you parse.

To proceed from here depends on what you want to do. There are a number of libraries such as syb (scrap your boilerplate) that let you write very concise tree traversals and modifications. lens is also an option. At a minimum check out Data.Traversable and Data.Foldable.

To modify the tree, you can do something as simple as

ignoreInnerClasses :: JavaClass -> JavaClass
ignoreInnerContents c = c{contents = filter isClass $ contents c}
 --                           ^^^ that is called a record update
    where isClass (IC _) = True
          isClass _      = False

and then you could potentially use something like syb to write

 everywhere (mkT ignoreInnerClass) toplevel

which will traverse everything and apply ignoreInnerClass to all JavaClasses. This is possible to do in lens and many other libraries too, but syb is very easy to read.

146

answered Oct 04 '22 20:10

Daniel Gratzer

I've never used bnfc-meta (suggested by @phg), but I would strongly recommend you look into BNFC (on hackage: http://hackage.haskell.org/package/BNFC). The basic approach is that you write your grammar in an annotated BNF style, and it will automatically generate an AST, parser, and pretty-printer for the grammar.

How suitable BNFC is depends upon the complexity of your grammar. If it's not context-free, you'll likely have a difficult time making any progress (I did make some success hacking up context-sensitive extensions, but that code's likely bit-rotted by now). The other downside is that your AST will very directly reflect the grammar specification. But since you already have a BNF specification, adding the necessary annotations for BNFC should be rather straightforward, so it's probably the fastest way to get a usable AST. Even if you decide to go a different route, you might be able to take the generated data types as a starting point for a hand-written version.

answered Oct 04 '22 20:10

John L

Related questions
                            
                                Check if a string contains a certain character
                            
                                Not in scope: `fromMaybe' - haskell
                            
                                Where is the Set type class?
                            
                                Combine functions to a function which returns a tuple
                            
                                When do [do x] and [do return x] evaluate differently?
                            
                                In Haskell terminology, what are monadic effects?
                            
                                What is the difference between map of Python and fmap of Haskell?
                            
                                What are algebraic structures in functional programming?
                            
                                Why do both map (^2) xs and map (2^) xs work as expected in Haskell?
                            
                                Haskell basic function definition problem
                            
                                A question concerning list accesses from a noobie functional programmer
                            
                                Can't seem to get my head around the 'list difference' (\\) operator
                            
                                get list of all nondecreasing sets of list in haskell
                            
                                Restricting values in type constructors [duplicate]
                            
                                Creating a new Ord instance for Lists
                            
                                How are dependent ranges computed in a list comprehension?
                            
                                Apply a function only if isJust
                            
                                Why is there no typeclass for container-types?
                            
                                Extensible serialization in Haskell
                            
                                Type of `foldMap . foldMap`

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to build Abstract Syntax Trees from grammar specification in Haskell?

Tags:

haskell

abstract-syntax-tree

zipper

parsec

happy

Vamshi Surabhi

People also ask

2 Answers

Daniel Gratzer

John L

Recent Activity

Donate For Us