How to efficently build an interpreter (lexer+parser) in C?

Tags:

I'm trying to make a meta-language for writing markup code (such as xml and html) which can be directly embedded into C/C++ code. Here is a simple sample written in this language, I call it WDI (Web Development Interface):

 /*
  * Simple wdi/html sample source code
  */
 #include <mySite>

 string name = "myName";
 string toCapital(string str);

 html
 {
  head {
   title { mySiteTitle; }
   link(rel="stylesheet", href="style.css");
  }
  body(id="default") {
   // Page content wrapper
   div(id="wrapper", class="some_class") {
    h1 { "Hello, " + toCapital(name) + "!"; }

    // Lists post
    ul(id="post_list") {
     for(post in posts) {
      li { a(href=post.getID()) { post.tilte; } }
     }
    }
   }
  }
 }

Basically it is a modified C source with a user-friendly interface for html. As you can see the traditional tag-based style is substituted by C-like commands, with blocks delimited by curly braces. I need to build an interpreter to translate this code to html and posteriorly insert it into C, so that it can be compiled. The C part stays intact. Inside the wdi source it is not necessary to use prints, every return statement will be used for output (in printf function). The program's output will be clean html code.

So, for example a heading 1 tag would be transformed like this:

h1 { "Hello, " + toCapital(name) + "!"; }
// would become:
printf("<h1>Hello, %s!</h1>", toCapital(name));

My main goal is to create an interpreter to translate wdi source to html like this:

tag(attributes) {content} => <tag attributes>content</tag>

Secondly, html code returned by the interpreter has to be inserted into C code with printfs. Variables and functions that occur inside wdi should also be sorted in order to use them as printf parameters (the case of toCapital(name) in sample source).

I am searching for efficient (I want to create a fast parser) way to create a lexer and parser for wdi. Already tried flex and bison, but as I am not sure if they are the best tools. Are there any good alternatives? What is the best way to create such an interpreter? Can you advise some brief literature on this issue?

451

asked May 20 '10 16:05

Rizo

2 Answers

bison/flex or yacc/lex is the traditional way to do it. IMHO, there is nothing better suited to the task at hand.

Note that the task can't be done by a regular language (i.e. regex, simple perl script, etc.), so you really need a parser.

Better to do it right. Most propably, a yacc/bison generated parser will be much cleaner (and faster) than some hand-crafted, recursive descending parser.

107

answered Sep 22 '22 10:09

Ingo

If you are really serious about this, what you want to do is to modify an existing C parser. The Edison Design Group C Front End might be an option, although it really wants to be just a C (C++) front end.

Another option is our DMS Software Reengineering Toolkit. DMS can be obtained with a C Front End that contains a full C parser driven entirely from a grammar.

DMS provides direct support for building dialects of languages, and what you want to do is build a dialect of C, so it would support your goal. DMS also provides lots of machinery for building translators, so it would be fairly easy to translate your dialect into real C code and emit it.

answered Sep 22 '22 10:09

Ira Baxter

Related questions
                            
                                What HTML tags can be used to send a message on Telegram Bot?
                            
                                Aligning image height along a row
                            
                                I'm trying to put a <button> inside an <input type="radio">'s <label>
                            
                                How to implement video calls over Django Channels?
                            
                                Inserting meta data into a live video stream
                            
                                How can I prevent Chrome from loading a cached webpage when offline?
                            
                                Sticky header input scrolls on input
                            
                                How to create dynamic drag and drop layout with react-grid-layout
                            
                                HTML/JS form not importing/exporting txt correctly
                            
                                jQuery tablesorter plugin column width incorrect in IE7
                            
                                Fix height of a table row in HTML Table
                            
                                iPhone web apps running as native apps
                            
                                set focus to iframe body/content in firefox?
                            
                                CSS file name case sensitivity & Css file caching
                            
                                Creating html templates using PHP
                            
                                z-index and Javascript events
                            
                                PDF Report generation [closed]
                            
                                modal-dialog or div overlay over frameset?
                            
                                Exporting data from a YUI DataTable
                            
                                Generated HTML word document not displaying image correctly

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to efficently build an interpreter (lexer+parser) in C?

Tags:

c

html

parsing

lexer

interpreter

Rizo

People also ask

2 Answers

Ingo

Ira Baxter

Recent Activity

Donate For Us