Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inlining Algorithm

Does anyone know of any papers discussion inlining algorithms? And closely related, the relationship of parent-child graph to call graph.

Background: I have a compiler written in Ocaml which aggressively inlines functions, primarily as a result of this and some other optimisations it generates faster code for my programming language than most others in many circumstances (including even C).

Problem #1: The algorithm has trouble with recursion. For this my rule is only to inline children into parents, to prevent infinite recursion, but this precludes sibling functions inlining once into each other.

Problem #2: I do not know of a simple way to optimise inlining operations. My algorithm is imperative with mutable representation of function bodies because it does not seem even remotely possible to make an efficient functional inlining algorithm. If the call graph is a tree, it is clear that a bottom up inlining is optimal.

Technical information: Inlining consists of a number of inlining steps. The problem is the ordering of the steps.

Each step works as follows:

  • we make a copy of the function to be inlined and beta-reduce by replacing both type parameters and value parameters with arguments.
  • We then replace return statement with an assignment to a new variable followed by a jump to the end of the function body.
  • The original call to the function is then replaced by this body.
  • However we're not finished. We must also clone all the children of the function, beta-reducting them as well, and reparent the clones to the calling function.

The cloning operation makes it extremely hard to inline recursive functions. The usual trick of keeping a list of what is already in progress and just checking to see if we're already processing this call does not work in naive form because the recursive call is now moved into the beta-reduced code being stuffed into the calling function, and the recursion target may have changed to a cloned child. However that child, in calling the parent, is still calling the original parent which calls its child, and now the unrolling of the recursion will not stop. As mentioned I broke this regress by only allowing inlining a recursive call to a child, preventing sibling recursions being inlined.

The cost of inlining is further complicated by the need to garbage collect unused functions. Since inlining is potentially exponential, this is essential. If all the calls to a function are inlined, we should get rid of the function if it has not been inlined into yet, otherwise we'll waste time inlining into a function which is no longer used. Actually keeping track of who calls what is extremely difficult, because when inlining we're not working with an actual function representation, but an "unravelled" one: for example, the list of instructions is being processed sequentially and a new list built up, and at any one point in time there may not be a coherent instruction list.

In his ML compiler Steven Weeks chose to use a number of small optimisations applied repeatedly, since this made the optimisations easy to write and easy to control, but unfortunately this misses a lot of optimisation opportunities compared to a recursive algorithm.

Problem #3: when is it safe to inline a function call?

To explain this problem generically: in a lazy functional language, arguments are wrapped in closures and then we can inline an application; this is the standard model for Haskell. However it also explains why Haskell is so slow. The closures are not required if the argument is known, then the parameter can be replaced directly with its argument where is occurs (this is normal order beta-reduction).

However if it is known the argument evaluation is not non-terminating, eager evaluation can be used instead: the parameter is assigned the value of the expression once, and then reused. A hybrid of these two techniques is to use a closure but cache the result inside the closure object. Still, GHC hasn't succeeded in producing very efficient code: it is clearly very difficult, especially if you have separate compilation.

In Felix, I took the opposite approach. Instead of demanding correctness and gradually improving efficiency by proving optimisations preserved semantics, I mandate that the optimisation defines the semantics. This guarantees correct operation of the optimiser at the expense of uncertainty about what how certain code will behave. The idea is to provide ways for the programmer to force the optimiser to conform to intended semantics if the default optimisation strategy is too aggressive.

For example, the default parameter passing mode allows the compiler to chose whether to wrap the argument in a closure, replace the parameter with the argument, or assign the argument to the parameter. If the programmer wants to force a closure, they can just pass in a closure. If the programmer wants to force eager evaluation, they mark the parameter var.

The complexity here is much greater than a functional programming language: Felix is a procedural language with variables and pointers. It also has Haskell style typeclasses. This makes the inlining routine extremely complex, for example, type-class instances replace abstract functions whenever possible (due to type specialisation when calling a polymorphic function, it may be possible to find an instance whilst inlining, so now we have a new function we can inline).

Just to be clear I have to add some more notes.

Inlining and several other optimisations such as user defined term reductions, typeclass instantiations, linear data flow checks for variable elimination, tail rec optimisation, are done all at once on a given function.

The ordering problem isn't the order to apply different optimisations, the problem is to order the functions.

I use a brain dead algorithm to detect recursion: I build up a list of everything used directly by a each function, find the closure, and then check if the function is in the result. Note the usage set is built up many times during optimisation, and this is a serious bottleneck.

Whether a function is recursive or not can change unfortunately. A recursive function might become non-recursive after tail rec optimisation. But there is a much harder case: instantiating a typeclass "virtual" function can make what appeared to be non-recursive recursive.

As to sibling calls, the problem is that given f and g where f calls g and g calls f I actually want to inline f into g, and g into f .. once. My infinite regress stopping rule is to only allow inlining of f into g if they're mutually recursive if f is a child of g, which excludes inlining siblings.

Basically I want to "flatten out" all code "as much as possible".

like image 745
Yttrill Avatar asked Dec 16 '10 02:12

Yttrill


1 Answers

I realize you probably already know all this, but it seems important to still write it in full, at least for further reference.

In the functional community, there is some litterature mostly from the GHC people. Note that they consider inlining as a transformation in the source language, while you seem to work at a lower level. Working in the source language -- or an intermediate language of reasonably similar semantics -- is, I believe, a big help for simplicity and correctness.

  • GHC Wiki : Inlining (contains a bibliography)
  • Secrets of the Glasgow Haskell inliner

For the question of the ordering compiler passes, this is quite arcane. Still in a Haskell setting, there is the Compilation by Transformation in a Non-strict Functional Language PhD Thesis which discusses the ordering of different compiler passes (and also inlining).

There is also the quite recent paper on Compilation by Equality Saturation which propose a novel approach to optimisation passes ordering. I'm not sure it has yet demonstrated applicability at a large scale, but it's certainly an interesting direction to explore.

like image 96
gasche Avatar answered Sep 28 '22 13:09

gasche