Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I implement forward references in a compiler?

I'm creating a compiler with Lex and YACC (actually Flex and Bison). The language allows unlimited forward references to any symbol (like C#). The problem is that it's impossible to parse the language without knowing what an identifier is.

The only solution I know of is to lex the entire source, and then do a "breadth-first" parse, so higher level things like class declarations and function declarations get parsed before the functions that use them. However, this would take a large amount of memory for big files, and it would be hard to handle with YACC (I would have to create separate grammars for each type of declaration/body). I would also have to hand-write the lexer (which is not that much of a problem).

I don't care a whole lot about efficiency (although it still is important), because I'm going to rewrite the compiler in itself once I finish it, but I want that version to be fast (so if there are any fast general techniques that can't be done in Lex/YACC but can be done by hand, please suggest them also). So right now, ease of development is the most important factor.

Are there any good solutions to this problem? How is this usually done in compilers for languages like C# or Java?

like image 834
Zifre Avatar asked May 31 '09 18:05

Zifre


People also ask

How do you do a forward reference?

A forward reference occurs when a label is used as an operand, for example as a branch target, earlier in the code than the definition of the label. The assembler cannot know the address of the forward reference label until it reads the definition of the label.

What is meant by a forward reference in C?

Forward reference is when you declare a type but do not define it. It allows you to use the type by pointer (or reference for C++) but you cannot declare a variable. This is a way to say to the compiler that something exists. Say that you have a Plop structure defined in Plop.

What is a forward declaration in C++?

In C++, Forward declarations are usually used for Classes. In this, the class is pre-defined before its use so that it can be called and used by other classes that are defined before this. Example: // Forward Declaration class A class A; // Definition of class A class A{ // Body };

What is forward reference and reverse reference?

Forward references are not resolved. That is, the use of a symbol at one point in the text which is not defined until some later point gives the wrong result. Backward references, on the other hand, are handled correctly.


2 Answers

It's entirely possible to parse it. Although there is an ambiguity between identifiers and keywords, lex will happily cope with that by giving the keywords priority.

I don't see what other problems there are. You don't need to determine if identifiers are valid during the parsing stage. You are constructing either a parse tree or an abstract syntax tree (the difference is subtle, but irrelevant for the purposes of this discussion) as you parse. After that you build your nested symbol table structures by performing a pass over the AST you generated during the parse. Then you do another pass over the AST to check that identifiers used are valid. Follow this with one or more additional parses over the AST to generate the output code, or some other intermediate datastructure and you're done!

EDIT: If you want to see how it's done, check the source code for the Mono C# compiler. This is actually written in C# rather than C or C++, but it does use .NET port of Jay which is very similar to yacc.

like image 111
U62 Avatar answered Sep 24 '22 15:09

U62


One option is to deal with forward references by just scanning and caching tokens till you hit something you know how to real with (sort of like "panic-mode" error recovery). Once you have run thought the full file, go back and try to re parse the bits that didn't parse before.

As to having to hand write the lexer; don't, use lex to generate a normal parser and just read from it via a hand written shim that lets you go back and feed the parser from a cache as well as what lex makes.

As to making several grammars, a little fun with a preprocessor on the yacc file and you should be able to make them all out of the same original source

like image 29
BCS Avatar answered Sep 21 '22 15:09

BCS