Does anyone have insight into the typical big-O complexity of a compiler?
I know it must be >= n
(where n is the number of lines in the program), because it needs to scan each line at least once.
I believe it must also be >= n.logn
for a procedural language, because the program can introduce O(n) variables, functions, procedures, and types etc., and when these are referenced within the program it will take O(log n) to look up each reference.
Beyond that my very informal understanding of compiler architecture has reached its limits and I am not sure if forward declarations, recursion, functional languages, and/or other tricks will increase the algorithmic complexity of the compiler.
So, in summary:
For a 'typical' procedural language (C, pascal, C#, etc.) is there a limiting big-O for an efficiently designed compiler (as a measure of number of lines)
For a 'typical' functional language (lisp, Haskell, etc.) is there a limiting big-O for an efficiently designed compiler (as a measure of number of lines)
This question is unanswerable in it's current form. The complexity of a compiler certainly wouldn't be measured in lines of code or characters in the source file. This would describe the complexity of the parser or lexer, but no other part of the compiler will ever even touch that file.
After parsing, everything will be in terms of various AST's representing the source file in a more structured manner. A compiler will have a lot of intermediate languages, each with it's own AST. The complexity of various phases would be in terms of the size of the AST, which doesn't correlate at all to the character count or even to the previous AST necessarily.
Consider this, we can parse most languages in linear time to the number of characters and generate some AST. Simple operations such as type checking are generally O(n)
for a tree with n
leaves. But then we'll translate this AST into a form with potentially, double, triple or even exponentially more nodes then on the original tree. Now we again run single pass optimizations on our tree, but this might be O(2^n)
relative to the original AST and lord knows what to the character count!
I think you're going to find it quite impossible to even find what n
should be for some complexity f(n)
for a compiler.
As a nail in the coffin, compiling some languages is undecidable including java, C# and Scala (it turns out that nominal subtyping + variance leads to undecidable typechecking). Of course C++'s templating system is turing complete which makes decidable compilation equivalent to the halting problem (undecidable). Haskell + some extensions is undecidable. And many others that I can't think of off the top of my head. There is no worst case complexity for these languages' compilers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With