Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing a Parser (for a markup language): Theory & Practice

I'd like to write an idiomatic parser for a markup language like Markdown. My version will be slightly different, but I perceive at least a minor need for something like this in Clojure, and I'd like to get on it.

I don't want to use a mess of RegExes (though I realize some will probably be needed), and I'd like to make something both powerful and in idiomatic Clojure.

I've begun a few different attempts (mostly on paper), but I'm terribly happy with them, as I feel as though I'm just improvising. That would be fine, but I've done plenty of exploring in the language of Clojure in the past month or two, and would like to, at least in part, follow in the paths of giants.

I'd like some pointers, or suggestions, or resources (books from O'Reilly would be awesome–love me some eBooks–but Amazon or wherever would be great, too). Whatever you can offer.

EDIT Brian Carper has an interesting post on using ANTLR from Clojure.

There's also clojure-pg and fnparse, which are Clojure parser-generators. fnparse even looks like it's got some decent documentation.

Still looking for resources etc! Just thought I'd update these with some findings of my own.

like image 804
Isaac Avatar asked Aug 06 '10 21:08

Isaac


People also ask

What is a markup parser?

Finally, the answer to the question "What's a markup parser?" A markup parser basically reads data that contains application-level markup, extracts tags and attributes from the markup, and generates some output.

What is a parsing algorithm?

The Document Parsing algorithm breaks up a document into its most extensive constituents, typically sentences and clauses. The initial step is usually to convert the sentences of the source text into their stem format called the Sentence Graph. Document parsing also includes tokenization.

How does a language parser work?

A parser is a compiler or interpreter component that breaks data into smaller elements for easy translation into another language. A parser takes input in the form of a sequence of tokens, interactive commands, or program instructions and breaks them up into parts that can be used by other components in programming.


1 Answers

Best I can think of is that Terrence Parr - the guy that leads the ANTLR parser generator - has written a markup language documented here. Anyway, there's source code there to look at.

like image 119
Steve Cooper Avatar answered Oct 01 '22 01:10

Steve Cooper