Why can't dead code detection be fully solved by a compiler?

Tags:

compiler-theory

People also ask

How will you identify a dead code in a program?

The quickest way to find dead code is to use a good IDE. Delete unused code and unneeded files. In the case of an unnecessary class, Inline Class or Collapse Hierarchy can be applied if a subclass or superclass is used. To remove unneeded parameters, use Remove Parameter.

Which activity can be used to identify and remove dead code?

If you're working with Visual Basic (or VB.NET or VBA), you can use our Project Analyzer to detect and even remove dead code. Project Analyzer shows how much dead and semi-dead code there is and where.

What type of analysis is detecting dead code?

Dead code analysis can be performed using live-variable analysis, a form of static-code analysis and data-flow analysis. This is in contrast to unreachable code analysis which is based on control-flow analysis.

Does Java do dead code elimination?

Eclipse compiler for Java (ECJ) is open source incremented compiler. We reform features of ECJ, related to optimization technique called as dead code detection and elimination. ECJ identifies the dead code. We are extending this compiler to eliminate the dead code.

The dead code problem is related to the Halting problem.

Alan Turing proved that it is impossible to write a general algorithm that will be given a program and be able to decide whether that program halts for all inputs. You may be able to write such an algorithm for specific types of programs, but not for all programs.

How does this relate to dead code?

The Halting problem is reducible to the problem of finding dead code. That is, if you find an algorithm that can detect dead code in any program, then you can use that algorithm to test whether any program will halt. Since that has been proven to be impossible, it follows that writing an algorithm for dead code is impossible as well.

How do you transfer an algorithm for dead code into an algorithm for the Halting problem?

Simple: you add a line of code after the end of the program you want to check for halt. If your dead-code detector detects that this line is dead, then you know that the program does not halt. If it doesn't, then you know that your program halts (gets to the last line, and then to your added line of code).

Compilers usually check for things that can be proven at compile-time to be dead. For example, blocks that are dependent on conditions that can be determined to be false at compile time. Or any statement after a return (within the same scope).

These are specific cases, and therefore it's possible to write an algorithm for them. It may be possible to write algorithms for more complicated cases (like an algorithm that checks whether a condition is syntactically a contradiction and therefore will always return false), but still, that wouldn't cover all possible cases.

Well, let's take the classical proof of the undecidability of the halting problem and change the halting-detector to a dead-code detector!

C# program

using System;
using YourVendor.Compiler;

class Program
{
    static void Main(string[] args)
    {
        string quine_text = @"using System;
using YourVendor.Compiler;

class Program
{{
    static void Main(string[] args)
    {{
        string quine_text = @{0}{1}{0};
        quine_text = string.Format(quine_text, (char)34, quine_text);

        if (YourVendor.Compiler.HasDeadCode(quine_text))
        {{
            System.Console.WriteLine({0}Dead code!{0});
        }}
    }}
}}";
        quine_text = string.Format(quine_text, (char)34, quine_text);

        if (YourVendor.Compiler.HasDeadCode(quine_text))
        {
            System.Console.WriteLine("Dead code!");
        }
    }
}

If YourVendor.Compiler.HasDeadCode(quine_text) returns false, then the line System.Console.WriteLn("Dead code!"); won't be ever executed, so this program actually does have dead code, and the detector was wrong.

But if it returns true, then the line System.Console.WriteLn("Dead code!"); will be executed, and since there is no more code in the program, there is no dead code at all, so again, the detector was wrong.

So there you have it, a dead-code detector that returns only "There is dead code" or "There is no dead code" must sometimes yield wrong answers.

If the halting problem is too obscure, think of it this way.

Take a mathematical problem that is believed to be true for all positive integer's n, but hasn't been proven to be true for every n. A good example would be Goldbach's conjecture, that any positive even integer greater than two can be represented by the sum of two primes. Then (with an appropriate bigint library) run this program (pseudocode follows):

 for (BigInt n = 4; ; n+=2) {
     if (!isGoldbachsConjectureTrueFor(n)) {
         print("Conjecture is false for at least one value of n\n");
         exit(0);
     }
 }

Implementation of isGoldbachsConjectureTrueFor() is left as an exercise for the reader but for this purpose could be a simple iteration over all primes less than n

Now, logically the above must either be the equivalent of:

 for (; ;) {
 }

(i.e. an infinite loop) or

print("Conjecture is false for at least one value of n\n");

as Goldbach's conjecture must either be true or not true. If a compiler could always eliminate dead code, there would definitely be dead code to eliminate here in either case. However, in doing so at the very least your compiler would need to solve arbitrarily hard problems. We could provide problems provably hard that it would have to solve (e.g. NP-complete problems) to determine which bit of code to eliminate. For instance if we take this program:

 String target = "f3c5ac5a63d50099f3b5147cabbbd81e89211513a92e3dcd2565d8c7d302ba9c";
 for (BigInt n = 0; n < 2**2048; n++) {
     String s = n.toString();
     if (sha256(s).equals(target)) {
         print("Found SHA value\n");
         exit(0);
     }
 }
 print("Not found SHA value\n");

we know that the program will either print out "Found SHA value" or "Not found SHA value" (bonus points if you can tell me which one is true). However, for a compiler to be able to reasonably optimise that would take of the order of 2^2048 iterations. It would in fact be a great optimisation as I predict the above program would (or might) run until the heat death of the universe rather than printing anything without optimisation.

I don't know if C++ or Java have an Eval type function, but many languages do allow you do call methods by name. Consider the following (contrived) VBA example.

Dim methodName As String

If foo Then
    methodName = "Bar"
Else
    methodName = "Qux"
End If

Application.Run(methodName)

The name of the method to be called is impossible to know until runtime. Therefore, by definition, the compiler cannot know with absolute certainty that a particular method is never called.

Actually, given the example of calling a method by name, the branching logic isn't even necessary. Simply saying

Application.Run("Bar")

Is more than the compiler can determine. When the code is compiled, all the compiler knows is that a certain string value is being passed to that method. It doesn't check to see if that method exists until runtime. If the method isn't called elsewhere, through more normal methods, an attempt to find dead methods can return false positives. The same issue exists in any language that allows code to be called via reflection.

Unconditional dead code can be detected and removed by advanced compilers.

But there is also conditional dead code. That is code that cannot be known at the time of compilation and can only be detected during runtime. For example, a software may be configurable to include or exclude certain features depending on user preference, making certain sections of code seemingly dead in particular scenarios. That is not be real dead code.

There are specific tools that can do testing, resolve dependencies, remove conditional dead code and recombine the useful code at runtime for efficiency. This is called dynamic dead code elimination. But as you can see it is beyond the scope of compilers.

A simple example:

int readValueFromPort(const unsigned int portNum);

int x = readValueFromPort(0x100); // just an example, nothing meaningful
if (x < 2)
{
    std::cout << "Hey! X < 2" << std::endl;
}
else
{
    std::cout << "X is too big!" << std::endl;
}

Now assume that the port 0x100 is designed to return only 0 or 1. In that case the compiler cannot figure out that the else block will never be executed.

However in this basic example:

bool boolVal = /*anything boolean*/;

if (boolVal)
{
  // Do A
}
else if (!boolVal)
{
  // Do B
}
else
{
  // Do C
}

Here the compiler can calculate out the the else block is a dead code. So the compiler can warn about the dead code only if it has enough data to to figure out the dead code and also it should know how to apply that data in order to figure out if the given block is a dead code.

EDIT

Sometimes the data is just not available at the compilation time:

// File a.cpp
bool boolMethod();

bool boolVal = boolMethod();

if (boolVal)
{
  // Do A
}
else
{
  // Do B
}

//............
// File b.cpp
bool boolMethod()
{
    return true;
}

While compiling a.cpp the compiler cannot know that boolMethod always returns true.

Related questions
                            
                                Languages and VMs: Features that are hard to optimize and why
                            
                                Minimum pumping length for the following regular languages
                            
                                Removing Left Recursion in ANTLR
                            
                                Does the compiler decide when to inline my functions (in C++)?
                            
                                How to construct parsing table for LL(k>1)?
                            
                                Are there any "fun" ways to learn about Languages, Grammars, Parsing and Compilers? [closed]
                            
                                What is an example of a lexical error and is it possible that a language has no lexical errors?
                            
                                Efficient way to recursively calculate dominator tree?
                            
                                SLR(1) Parser and epsilon involved
                            
                                What is the reason for the creation of LLVM?
                            
                                Could C++ have not obviated the pimpl idiom?
                            
                                How do C compilers implement functions that return large structures?
                            
                                Steps to creating an NFA from a regular expression
                            
                                Are there such a thing as LL(0) parsers?
                            
                                Learning Resources on Parsers, Interpreters, and Compilers [closed]
                            
                                Register allocation and spilling, the easy way?
                            
                                (When) Should I learn compilers?
                            
                                What's the difference between parse trees and abstract syntax trees? [duplicate]
                            
                                What programming languages are context-free?
                            
                                What's the difference between parse trees and abstract syntax trees (ASTs)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why can't dead code detection be fully solved by a compiler?

Tags:

compiler-theory

People also ask

Related questions

Recent Activity

Donate For Us