Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What techniques are used to write a parser that switches between languages?

I'm interested in how a parser like the Razor view engine can parse two distinct languages like C# and JavaScript.

It's very cool that the following works, for instance:

$("#fm_duedate").val('@DateTime.Now.AddMonths(1).ToString("MM/dd/yyyy")');

I'm going to try and look at the source but I'm curious if there's a some kind of theoretical foundation for a parser like this or is it more brute force like taking the union of the two languages and parsing that?

Trying to reason it for myself, I say "you start with a parser for each language then you add to each one a set of productions that switch it to the other" but I doubt its so simple.

I guess the perfect answer would be a pointer to discussion on how the Razor engine is implemented or a walk-through of the source (I haven't actually Google'd this for fear of going down a rabbit hole). Alternately, just some insight on how the problem of parsing two languages is approached would be great.

like image 453
Aaron Anodide Avatar asked Dec 03 '25 02:12

Aaron Anodide


2 Answers

As Corey points out, Razor and similar frameworks do not do anything particularly fancy.

However there are some more theoretically sound models for building parsers for languages where one language is embedded in another. My erstwhile colleague Luke Hoban has a great introductory article on parser combinators, which afford a very nice way to build a parser for one-language-embedded-in-another-language scenarios:

http://blogs.msdn.com/b/lukeh/archive/2007/08/19/monadic-parser-combinators-using-c-3-0.aspx

The wikipedia page is pretty straightforward as well:

http://en.wikipedia.org/wiki/Parser_combinator

like image 158
Eric Lippert Avatar answered Dec 04 '25 14:12

Eric Lippert


Razor (and the other view engines) do not parse the HTML or JavaScript of a view. Instead they parse the text to detect specific tokens, with no real concern about the surrounding text.

In the case of Razor, every @ character in the source file is processed as a code block of some sort. Razor is quite smart about detecting the expression that follows the @ character, including handling things like @foreach (var x in collection) { and locating the closing } while not trying to parse the HTML (or JavaScript) inside. It also lets you use @{ } and @( ) to override the processing to a degree.

I find the ASPX <%...%> format simpler to read, since I've used that format more and I've got some established pattern recognition going on for those. Having explicit start/finish tokens is simpler to process and simpler to read in-place.

like image 31
Corey Avatar answered Dec 04 '25 16:12

Corey



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!