Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Finding all the code in a given binary is equivalent to the Halting problem." Really?

Was just reading the highly voted question regarding Emulators and the statement

It's been proven that finding all the code in a given binary is equivalent to the Halting problem.

Really stuck out at me.

Surely that can't be true? Isn't it just a large dependency graph?

Would be really grateful for some further insight into this statement.

like image 474
Maxim Gershkovich Avatar asked Mar 14 '11 13:03

Maxim Gershkovich


3 Answers

I disagree with larsman.

The halting problem says that no program P exists that can take any program and decide whether that program executes the halt instruction. Let me quote wikipedia:

Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist. We say that the halting problem is undecidable over Turing machines.

On the other hand we're not trying to make such program/algorithm, but we're trying to find all the code in this/these specific program(s). If we reverse-engineer the program and see that it immediately calls exit() (very optimistic example situation) we have proven that it will call halt, while it was impossible?!

If we we're trying to build an emulator that can run any program we would fail since then you can (easily) reduce that to the Halting problem. But usually you are building an emulator for something like a Game Boy which supports a finite amount of game cartridges (programs) and thus it is possible.

like image 74
orlp Avatar answered Oct 18 '22 21:10

orlp


The halting problem for finite state machines is solvable (although it might take many lifetimes.....of the universe), and any machine you might be emulating is a finite state machine. Just run the program, and the number of steps is bounded by the number of possible states; if that number is exceeded without halting then the program will never halt, since it must be in a loop.

Practically speaking, finding all code is a much easier problem unless the code can use computed gotos. Rather than running the code, you simply take all branches exactly once at each possible branch point.

like image 28
R.. GitHub STOP HELPING ICE Avatar answered Oct 18 '22 20:10

R.. GitHub STOP HELPING ICE


I believe what is meant is "finding all code that is ever executed", i.e. finding coverage, possibly in combination with dynamically generated code. That can indeed be reduced to the halting problem.

Say that you have a perfect coverage tool that will find every piece of code in a program that may ever be executed (so the rest is dead code). Given a program P, this tool would also be able to decide whether the extended program (P ; halt) ever executes the halt instruction, or whether the halt part is dead code. So, it would solve the halting problem, which we know is undecidable.

like image 28
Fred Foo Avatar answered Oct 18 '22 21:10

Fred Foo