Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which parsers do the modern compilers use?

I wasn't able to find any information on the parsers of some of the modern compilers, especially:

  • TypeScript
  • C#
  • Rust

Are they using LL(k), LR(k), a mixture or something different? Is there any site that lists information like this about compilers/parsers for multiple languages? As these compilers are generally considered "modern", I'm very interested in their parsing techniques.

like image 553
nikeee Avatar asked Mar 07 '18 21:03

nikeee


People also ask

What are parsers in compiler?

In computer technology, a parser is a program that's usually part of a compiler. It receives input in the form of sequential source program instructions, interactive online commands, markup tags or some other defined interface.

What are modern compilers?

of Modern Compilers. 2. 1.1 INTRODUCTION. Compiler is a software that translate one language into another language. The compiler converts the code from high-level language (source code) to low level language (machine code/object code) as shown in figure 1.

Which is the most popular parser?

Java Compiler Compiler (JavaCC) is the most popular parser generator for use with Java applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar.


1 Answers

All three of them use idiosyncratic, hand-built parsers, loosely based on predictive (top-down) parsing, with whatever kludges and hacks seemed necessary for the cases where top-down parsing is non-deterministic.

I certainly wouldn't recommend this as a model for how to build a parser, if that is what you want to do. It is much easier to build a parser with parser generators; the resulting parser will be shorter and easier to read, and if you do it right, it will be self-documenting in the sense that the grammar used to generate the parser is precisely the grammar you would want to put in the documentation.

That being the case, you might legitimately wonder why so few "modern" languages use parser generators. I don't have a good answer for that, other than the explanations related to programming psychology. Writing a parser by hand is a lot of work and a massive pain, but it's far from the largest task in writing a compiler -- good optimization is a much more complex problem, and there are no simple generator tools for that -- so the labour may not be as much of a disincentive as one would think. It is clearly possible to parse most languages with "only" a few thousand LoC, and that might well seem to be less effort that learning the ins and outs of a parser generator.

One common explanation is that it is apparently easier to produce good compiler error messages with a top-down parser, and good error messages are very important to language adoption. (The pathetically unreadable error messages which resulted from minor typos in C++ templates were clearly a barrier to adoption of C++, for example. Even today, template compile error messages are hard to read, but they have gotten better.) That's not to say that it is impossible to write good compiler error message handlers with a bottom-up grammar, but it is not always obvious how to do so.

Another explanation is that most people seem to want to write their parser (and their compiler) in their own language. The usual strategy is to build a quick-and-dirty minicompiler in some other distasteful language, and use it to bootstrap a gorgeously elegant compiler in the language being developed. (This mentality leads to the odd fact that most languages are first optimised for compiler writing, although it is likely that only one compiler will ever be written in each language.)

The bootstrap strategy makes it difficult to use a parser generator, because the parser generator itself would first have to be rewritten to generate code in the new language. In a perfect world, it would be easy to retrofit code generation for a new language into an existing parser generator, but the reality is quite different. (If you want to take this project on, I strongly recommend using the Lemon parser as a basis rather than any implementation of yacc or bison. It won't be trivial but at least you won't need to learn how to cast spells in M4.)

As noted in this bug report, the major downside of handbuilt parsers is that there is no easy way to verify that they actually recognise the language documented by the published grammar. The Rust source tree now includes a Bison/Flex parser which produces an AST, although it is not used by the Rust compiler (which is written in Rust, see above). Despite its flaws, this grammar is lot closer to the published grammar than the production parser. But it makes no attempt to produce any kind of informative syntax error messages (or, indeed, to generate code).

like image 148
rici Avatar answered Sep 18 '22 14:09

rici