What's the best way to write a parser by hand? [closed]

Tags:

We've used ANTLR to create a parser for a SQL-like grammar, and while the results are satisfactory in most cases, there are a few edge cases that we need to fix; and since we didn't write the parser ourselves we don't really understand it well enough to be able to make sensible changes.

So, we'd like to write our own parser. What's the best way to go about writing a parser by hand? What sort of parser should we use - recursive descent has been recommended; is that right? We'll be writing it in C#, so any tutorials for writing parsers in that language would be gratefully received.

UPDATE: I'd also be interested in answers that involve F# - I've been looking for a reason to use that in a project.

818

asked Feb 10 '09 09:02

Simon

3 Answers

I would highly recommend the F# language as your language of choice for parsing on the .NET Platform. It's roots in the ML family of languages means it has excellent support for language-oriented programming.

Discriminated unions and pattern-matching allow for a very succinct and powerful specification of your AST. Higher-order functions allow for definition of parse operations and their composition. First-class support for monadic types allows for state management to be handled implicitly greatly simplifying the composition of parsers. Powerful type-inference greatly aides the definition of these (complex) types. And all of this can be specified and executed interactively allowing you to rapidly prototype.

Stephan Tolksdorf has put this into practice with his parser combinator library FParsec

From his examples we see how naturally an AST is specified:

type expr =
    | Val of string
    | Int of int
    | Float of float
    | Decr of expr

type stmt =
    | Assign of string * expr
    | While of expr * stmt
    | Seq of stmt list
    | IfThen of expr * stmt
    | IfThenElse of expr * stmt * stmt
    | Print of expr

type prog = Prog of stmt list

the implementation of the parser (partially elided) is just as succinct:

let stmt, stmtRef = createParserForwardedToRef()

let stmtList = sepBy1 stmt (ch ';')

let assign =
    pipe2 id (str ":=" >>. expr) (fun id e -> Assign(id, e))

let print = str "print" >>. expr |>> Print

let pwhile =
    pipe2 (str "while" >>. expr) (str "do" >>. stmt) (fun e s -> While(e, s))

let seq =
    str "begin" >>. stmtList .>> str "end" |>> Seq

let ifthen =
    pipe3 (str "if" >>. expr) (str "then" >>. stmt) (opt (str "else" >>. stmt))
          (fun e s1 optS2 ->
               match optS2 with
               | None    -> IfThen(e, s1)
               | Some s2 -> IfThenElse(e, s1, s2))

do stmtRef:= choice [ifthen; pwhile; seq; print; assign]


let prog =
    ws >>. stmtList .>> eof |>> Prog

On the second line, as an example, stmt and ch are parsers and sepBy1 is a monadic parser combinator that takes two simple parsers and returns a combination parser. In this case sepBy1 p sep returns a parser that parses one or more occurrences of p separated by sep. You can thus see how quickly a powerful parser can be combined from simple parsers. F#'s support for overridden operators also allow for concise infix notation e.g. the sequencing combinator and the choice combinator can be specified as >>. and <|>.

Best of luck,

Danny

200

answered Oct 19 '22 10:10

Daniel Asher

The only kind of parser that can be handwritten by a sane human being is a recursive-descent. That said, writing bottom-up parser by hand is still possible but is very undesirable.

If you're up for RD parser you have to verify that SQL grammar is not left-recursive (and eliminate the recursion if necessary), and then basically write a function for each grammar rule. See this for further reference.

answered Oct 19 '22 11:10

Anton Gogolev

Adding my voice to the chorus in favor of recursive-descent (LL1). They are simple, fast, and IMO, not at all hard to maintain.

However, take a good look at your language to make sure it is LL1. If you have any syntax like C has, like ((((type))foo)[ ]) where you might have to descend multiple layers of parentheses before you even find out if you are looking at a type, variable, or expression, then LL1 will be very difficult, and bottom-up wins.

answered Oct 19 '22 10:10

Mike Dunlavey

Related questions
                            
                                Can't add a concrete instance of a generic interface to a generic collection
                            
                                How to Properly Add CSS in a MailMessage
                            
                                HtmlAttributes in Extension Method
                            
                                OWIN Webapi app receiving "System.ArgumentException no conversion available..." when bundled with nuget
                            
                                AutoMapper - strongly typed data set
                            
                                WPF - Windows Azure Style TabControl(?) [closed]
                            
                                C# and Ubuntu - How do I get size of primary screen?
                            
                                Evaluate Lambda Expression as part of Expression Tree
                            
                                How to get only the first value from a flags enum value and get it in the order they were added
                            
                                Databound Flipview remove ViewModel.ItemAt(0) causes flip transition
                            
                                Cannot return IObservable<T> from a method marked async
                            
                                C# HtmlAgilityPack inner html dont change after appending node
                            
                                Authorization failure in USPS Address verification API
                            
                                Signal R Hub Conventions in regards to Web Api's
                            
                                Number of Threads in Monotouch application
                            
                                WebSocket Server isn't firing the handshake
                            
                                How can I POST an HTTP request and wait for a callback without blocking the current thread?
                            
                                Why does this unrelated error happen when I try to setup Windows 8 settings?
                            
                                IoC: Castle Windsor and WebAPI
                            
                                Show Drawing.Image in WPF

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the best way to write a parser by hand? [closed]

Tags:

c#

.net

parsing

f#