Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a markup language like markdown [closed]

Tags:

c

yacc

lex

I've been looking through creating markup languages similar to Markdown. I was wondering where to start with something like this. I've researched a bit on creating languages, and I've ended up with tutorials talking about lexers and ASTs - in the end, these languages are passed to something like LLVM.

From what I understand, languages like C are imperative languages, and languages like Markdown are declarative. What exactly does the toolchain look like for something that probably isn't going to touch anything like the LLVM?

I've seen other answers like how to tokenize a language in Python. However, how might I do this in C? I'd like to have something that can be used anywhere (e.g. integrated into a Ruby native extension, or in a C# project).

I can't seem to find a good direction to go with this. Does anybody have experience / tips on where to start? At what point and where would I build the "binary" (creating HTML from source code?)

Does Markdown even use a lexer? From the syntax, it looks like it could very well just use regular expressions.

Apologies if this is too broad, but I can't find very much info on the topic (perhaps I'm just looking in the wrong places!)

like image 451
Alexander Lozada Avatar asked Mar 11 '23 07:03

Alexander Lozada


1 Answers

You are right, simple markup languages like Markdown are declarative. Very simple implementations exist that do not involve any lexers and ASTs.

The original Markdown implementation, for example, was a simple Perl script using regular expressions. It was written by John Gruber (the creator of Markdown) and is available here: http://daringfireball.net/projects/downloads/Markdown_1.0.1.zip

There is also a C implementation you can have a look at, called Discount, available here: http://www.pell.portland.or.us/~orc/Code/discount/

Both these tools are completely open-source and show you exactly what is necessary to process a markup language. They include the whole toolchain, including the parser.

like image 61
Yann Bodson Avatar answered Mar 20 '23 21:03

Yann Bodson