Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the simplest parsing algorithm that can parse C code?

Tags:

c

parsing

Does anyone know what the weakest family of widely-used parsing algorithms is that can parse C code? That is, is the C grammar LL(1), LR(0), LALR(1), etc.? I'm curious because as a side project I'm interested in writing a parser generator for one of these families and would like to ultimately be able to parse C code for another side project.

like image 378
templatetypedef Avatar asked Jan 24 '11 22:01

templatetypedef


People also ask

Which parser is used in C?

C is (mostly) parseable with an LALR(1) grammar, although you need to implement some version of the "lexer hack" in order to correctly parse cast expressions.

Which of the parsing algorithm is most efficient?

The LR paring algorithm is one of the most efficient parsing algorithms. It is totally deterministic and no backtracking or search is involved.

What is a parsing algorithm?

The Document Parsing algorithm breaks up a document into its most extensive constituents, typically sentences and clauses. The initial step is usually to convert the sentences of the source text into their stem format called the Sentence Graph. Document parsing also includes tokenization.


1 Answers

It seems that Bison uses an LALR(1) parser. LALR parsers are more robust than LL parsers, but are also more complex. From this I suspect that LALR(1) is probably the weakest parsing algorithm which can parse C code.

Unless you're really set on rolling your own recognizer. ANTLR would probably be your best bet to do this. ANTLR uses an LL* algorithm (which is, effectively, LALR).

like image 108
David Weiser Avatar answered Sep 17 '22 18:09

David Weiser