Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a compiler automatically detect pure functions without the type information about purity?

So I'm arguing with my friend who claims that a compiler like GCC can detect a pure function automatically without any type information. I doubt that.

Languages like D or Haskell have purity in their type systems and a programmer explicitly defines what function is pure or not. A pure function has no side effects and can therefore very easily be parallelized.

So the question is: Is this all necessary or not? Could a compiler detect purity, without any meta or type information, just by assuming that anything that does IO or accesses global variables automatically is not pure?

like image 895
Christian Zeller Avatar asked Jan 06 '12 16:01

Christian Zeller


People also ask

What is purity in functional programming?

Functional Purity Purity means that functions have no side effects. The output of a function is predictably the same as long as the input is the same. A pure function isn't affected by and doesn't affect anything outside; it also doesn't mutate any value.

Why do we need to declare a function as pure?

Another reason to use pure functions where possible is testing and refactoring. One of the major benefits of using pure functions is they are immediately testable. They will always produce the same result if you pass in the same arguments. They also makes maintaining and refactoring code much easier.


3 Answers

Sure, you can detect pure functions in some cases. For instance,

int f(int x) {     return x*2; } 

can be detected as pure with simple static analysis. The difficulty is doing this in general, and detecting interfaces which use "internal" state but are externally pure is basically impossible.

GCC does have the warning options -Wsuggest-attribute=pure and -Wsuggest-attribute=const, which suggest functions that might be candidates for the pure and const attributes. I'm not sure whether it opts to be conservative (i.e. missing many pure functions, but never suggesting it for a non-pure function) or lets the user decide.

Note that GCC's definition of pure is "depending only on arguments and global variables":

Many functions have no effects except the return value and their return value depends only on the parameters and/or global variables. Such a function can be subject to common subexpression elimination and loop optimization just as an arithmetic operator would be. These functions should be declared with the attribute pure.

— GCC manual

Strict purity, i.e. the same results for the same arguments in all circumstances, is represented by the const attribute, but such a function cannot even dereference a pointer passed to it. So the parallelisation opportunities for pure functions are limited, but much fewer functions can be const compared to the pure functions you can write in a language like Haskell.

By the way, automatically parallelising pure functions is not as easy as you might think; the hard part becomes deciding what to parallelise. Parallelise computations that are too cheap, and overhead makes it pointless. Don't parallelise enough, and you don't reap the benefits. I don't know of any practical functional language implementation that does automatic parallelisation for this reason, although libraries like repa parallelise many operations behind the scenes without explicit parallelism in the user code.

like image 136
ehird Avatar answered Oct 15 '22 09:10

ehird


There is another problem. Consider

int isthispure(int i) {    if (false) return getchar();    return i + 42; } 

The function is effectively pure, though it contains impure code, but this code cannot be reached. Now suppose false is replaced by g(i) but we know quite sure that g(i) is false (for example, g might check if its argument is a Lychrel number). To prove that isthispure is indeed pure, the compiler would have to prove that no Lychrel numbers exist.

(I admit that this is a quite theoretical consideration. One could also decide that if a function contains any impure code, it is itself impure. But this is not justified by the C type system, IMHO.)

like image 25
Ingo Avatar answered Oct 15 '22 09:10

Ingo


Determining if a function is pure (even in the limited sense used by GCC) is equivalent to the halting problem, so the answer is "not for arbitrary functions." It is possible to automatically detect that some functions are pure, others are not pure, and flag the rest as "unknown", which allows for automatic parallelization in some cases.

In my experience, even programmers aren't very good at figuring out such things, so I want the type system to help keep track of it for me, not just for the optimizer.

like image 44
Mike William Meyer Avatar answered Oct 15 '22 09:10

Mike William Meyer