implementing type inference

Tags:

I see some interesting discussions here about static vs. dynamic typing. I generally prefer static typing, due to compile type checking, better documented code, etc. However, I do agree that they do clutter up the code if done the way Java does it, for example.

So I'm about to start building a functional style language of my own, and type inference is one of the things that I want to implement. I do understand that it is a big subject, and I'm not trying to create something that has not been done before, just basic inferencing...

Any pointers on what to read up that will help me with this? Preferably something more pragmatic/practical as opposed to more theoretical category theory/type theory texts. If there's an implementation discussion text out there, with data structures/algorithms, that would just be lovely.

309

asked Jan 06 '09 05:01

deepblue

2 Answers

I found the following resources helpful for understanding type inference, in order of increasing difficulty:

Chapter 30 (Type Inference) of the freely available book PLAI, Programming Languages: Application and Interpretation, sketches unification-based type inference.
The summer course Interpreting types as abstract values presents elegant evaluators, type checkers, type reconstructors and inferencers using Haskell as a metalanguage.
Chapter 7 (Types) of the book EOPL, Essentials of Programming Languages.
Chapter 22 (Type Reconstruction) of the book TAPL, Types and Programming Languages, and the corresponding OCaml implementations recon and fullrecon.
Chapter 13 (Type Reconstruction) of the new book DCPL, Design Concepts in Programming Languages.
Selection of academic papers.
Closure compiler's TypeInference is an example of the data-flow analysis approach to type inference, which is better suited to dynamic languages that the Hindler Milner approach.

However, since the best way to learn is to do, I strongly suggest implementing type inference for a toy functional language by working through a homework assignment of a programming languages course.

I recommend these two accessible homeworks in ML, which you can both complete in less than a day:

PCF Interpreter (a solution) to warm up.
PCF Type Inference (a solution) to implement algorithm W for Hindley-Milner type inference.

These assignments are from a more advanced course:

Implementing MiniML
Polymorphic, Existential, Recursive Types (PDF)
Bi-Directional Typechecking (PDF)
Subtyping and Objects (PDF)

180

answered Oct 17 '22 08:10

namin

It's unfortunate that much of the literature on the subject is very dense. I too was in your shoes. I got my first introduction to the subject from Programming Languages: Applications and Interpretation

http://www.plai.org/

I'll try to summarize the abstract idea followed by details that I did not find immediately obvious. First, type inference can be thought of generating and then solving constraints. To generate constraints, you recurse through the syntax tree and generate one or more constraints on each node. For example, if the node is a + operator, the operands and the results must all be numbers. A node that applies a function has the same type as the result of the function, and so on.

For a language without let, you can blindly solve the above constraints by substitution. For example:

(if (= 1 2)      1      2)

here, we can say that the condition of the if statement must be Boolean, and that the type of the if statement is the same as the type of its then and else clauses. Since we know 1 and 2 are numbers, by substitution, we know the if statement is a number.

Where things get nasty, and what I couldn't understand for a while, is dealing with let:

(let ((id (lambda (x) x)))     (id id))

Here, we've bound id to a function that returns whatever you've passed in, otherwise known as the identity function. The problem is the type of the function's parameter x is different on each usage of id. The second id is a function of type a -> a, where a can be anything. The first is of type (a -> a) -> (a -> a). This is known as let-polymorphism. The key is to solve constraints in a particular order: first solve constraints for the definition of id. This will be a -> a. Then fresh, separate copies of the type of id can be substituted into the constraints for each place id is used, for example a2 -> a2 and a3 -> a3.

That wasn't readily explained in online resources. They'll mention algorithm W or M but not how they work in terms of solving constraints, or why it doesn't barf on let-polymorphism: each of those algorithms enforce an ordering on solving the constraints.

I found this resource extremely helpful to tie Algorithm W, M, and the general concept of constraint generation and solving all together. It's a little dense, but better than many:

http://www.cs.uu.nl/research/techreps/repo/CS-2002/2002-031.pdf

Many of the other papers there are nice too:

http://people.cs.uu.nl/bastiaan/papers.html

I hope that helps clarify a somewhat murky world.

answered Oct 17 '22 07:10

Paul

Related questions
                            
                                How do I compile the asm generated by GCC?
                            
                                Is C open source?
                            
                                Visual Studio: LINK : fatal error LNK1181: cannot open input file
                            
                                How does an interpreter/compiler work
                            
                                Why GDB jumps unpredictably between lines and prints variables as "<value optimized out>"?
                            
                                What is the difference between LR(0) and SLR parsing?
                            
                                Where can I learn the basics of writing a lexer?
                            
                                Do compilers produce better code for do-while loops versus other types of loops?
                            
                                How can the Linux kernel compile itself?
                            
                                Why is writing a compiler in a functional language easier? [closed]
                            
                                combine two GCC compiled .o object files into a third .o file
                            
                                Is gcc C compiler written in C itself?
                            
                                GCC: how is march different from mtune?
                            
                                How to approach creating a JVM programming language?
                            
                                Is the creation of Java class files deterministic?
                            
                                What kinds of patterns could I enforce on the code to make it easier to translate to another programming language? [closed]
                            
                                Compile, Build or Archive problems with Xcode 4 (and dependencies)
                            
                                Why does this Java code compile?
                            
                                What is the difference between a token and a lexeme?
                            
                                Are GCC and Clang parsers really handwritten?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

implementing type inference

Tags:

language-design

functional-programming

compiler-construction

type-inference

deepblue

People also ask

2 Answers

namin

Paul

Recent Activity

Donate For Us