What techniques are used to write a parser that switches between languages?

Question

I'm interested in how a parser like the Razor view engine can parse two distinct languages like C# and JavaScript.

It's very cool that the following works, for instance:

$("#fm_duedate").val('@DateTime.Now.AddMonths(1).ToString("MM/dd/yyyy")');

I'm going to try and look at the source but I'm curious if there's a some kind of theoretical foundation for a parser like this or is it more brute force like taking the union of the two languages and parsing that?

Trying to reason it for myself, I say "you start with a parser for each language then you add to each one a set of productions that switch it to the other" but I doubt its so simple.

I guess the perfect answer would be a pointer to discussion on how the Razor engine is implemented or a walk-through of the source (I haven't actually Google'd this for fear of going down a rabbit hole). Alternately, just some insight on how the problem of parsing two languages is approached would be great.

Eric Lippert · Accepted Answer

As Corey points out, Razor and similar frameworks do not do anything particularly fancy.

However there are some more theoretically sound models for building parsers for languages where one language is embedded in another. My erstwhile colleague Luke Hoban has a great introductory article on parser combinators, which afford a very nice way to build a parser for one-language-embedded-in-another-language scenarios:

http://blogs.msdn.com/b/lukeh/archive/2007/08/19/monadic-parser-combinators-using-c-3-0.aspx

The wikipedia page is pretty straightforward as well:

http://en.wikipedia.org/wiki/Parser_combinator

Corey · Answer

Razor (and the other view engines) do not parse the HTML or JavaScript of a view. Instead they parse the text to detect specific tokens, with no real concern about the surrounding text.

In the case of Razor, every @ character in the source file is processed as a code block of some sort. Razor is quite smart about detecting the expression that follows the @ character, including handling things like @foreach (var x in collection) { and locating the closing } while not trying to parse the HTML (or JavaScript) inside. It also lets you use @{ } and @( ) to override the processing to a degree.

I find the ASPX <%...%> format simpler to read, since I've used that format more and I've got some established pattern recognition going on for those. Having explicit start/finish tokens is simpler to process and simpler to read in-place.

What techniques are used to write a parser that switches between languages?

Tags:

c#

asp.net-mvc

razor

viewengine

Aaron Anodide

2 Answers

Eric Lippert

Corey

Recent Activity

Donate For Us

What techniques are used to write a parser that switches between languages?

Tags:

c#

asp.net-mvc

razor

viewengine

Aaron Anodide

2 Answers

Eric Lippert

Corey

Related questions

Recent Activity

Donate For Us