Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proving equivalence of programs

The ultimate in optimizing compilers would be one that searched among the space of programs for a program equivalent to the original but faster. This has been done in practice for very small basic blocks: https://en.wikipedia.org/wiki/Superoptimization

It sounds like the hard part is the exponential nature of the search space, but actually it's not; the hard part is, supposing you find what you're looking for, how do you prove that the new, faster program is really equivalent to the original?

Last time I looked into it, some progress had been made on proving certain properties of programs in certain contexts, particularly at a very small scale when you are talking about scalar variables or small fixed bit vectors, but not really on proving equivalence of programs at a larger scale when you are talking about complex data structures.

Has anyone figured out a way to do this yet, even 'modulo solving this NP-hard search problem that we don't know how to solve yet'?

Edit: Yes, we all know about the halting problem. It's defined in terms of the general case. Humans are an existence proof that this can be done for many practical cases of interest.

like image 436
rwallace Avatar asked Nov 07 '22 20:11

rwallace


1 Answers

You're asking a fairly broad question, but let me see if I can get you going.

John Regehr does a really nice job surveying some relevant papers on superoptimizers: https://blog.regehr.org/archives/923

The thing is you don't really need to prove whole program equivalence for these types of optimizations. Instead you just need to prove that given the CPU is in a particular state, 2 sequences of code modify the CPU state in the same way. To prove this across many optimizations (i.e. at scale), typically you might first throw some random inputs at both sequences. If they're not equivalent bits of code then you might get lucky and very quickly show this (proof by contradiction) and you can move on. If you haven't found a contradiction, you can now try to prove equivalence via a computationally expensive SAT solver. (As an aside, the STOKE paper that Regehr mentions is particularly interesting if you're interested in superoptimizers.)

Now looking at whole program semantic equivalence, one approach here is the one used by the CompCert compiler. Essentially that compiler is proving this theorem:

If CompCert is able to translate C code X into assembly code Y then X and Y are semantically equivalent.

In addition CompCert does apply a few compiler optimizations and indeed these optimizations are often the areas that traditional compilers get wrong. Perhaps something like CompCert is what you're after in which case, the compiler goes about things via a series of refinement passes where it proves that if each pass succeeds, the results are semantically equivalent to the previous pass.

like image 191
user2001002 Avatar answered Dec 06 '22 07:12

user2001002