I've been looking through creating markup languages similar to Markdown. I was wondering where to start with something like this. I've researched a bit on creating languages, and I've ended up with tutorials talking about lexers and ASTs - in the end, these languages are passed to something like LLVM.
From what I understand, languages like C are imperative languages, and languages like Markdown are declarative. What exactly does the toolchain look like for something that probably isn't going to touch anything like the LLVM?
I've seen other answers like how to tokenize a language in Python. However, how might I do this in C? I'd like to have something that can be used anywhere (e.g. integrated into a Ruby native extension, or in a C# project).
I can't seem to find a good direction to go with this. Does anybody have experience / tips on where to start? At what point and where would I build the "binary" (creating HTML from source code?)
Does Markdown even use a lexer? From the syntax, it looks like it could very well just use regular expressions.
Apologies if this is too broad, but I can't find very much info on the topic (perhaps I'm just looking in the wrong places!)
You are right, simple markup languages like Markdown are declarative. Very simple implementations exist that do not involve any lexers and ASTs.
The original Markdown implementation, for example, was a simple Perl script using regular expressions. It was written by John Gruber (the creator of Markdown) and is available here: http://daringfireball.net/projects/downloads/Markdown_1.0.1.zip
There is also a C implementation you can have a look at, called Discount, available here: http://www.pell.portland.or.us/~orc/Code/discount/
Both these tools are completely open-source and show you exactly what is necessary to process a markup language. They include the whole toolchain, including the parser.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With