Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using deftransform/defknown in SBCL internals to get the compiler to transform user authored functions

At the end of section 6.5 in the current SBCL manual, we have the following quote:

If your system's performance is suffering because of some construct which could in principle be compiled efficiently, but which the SBCL compiler can't in practice compile efficiently, consider writing a patch to the compiler and submitting it for inclusion in the main sources. Such code is often reasonably straightforward to write; search the sources for the string “deftransform” to find many examples (some straightforward, some less so).

I've been playing around and found the likes of sb-c::defknown and sb-c::deftransform but thus far have had little luck in successfully adding any new transforms that do anything.

Lets pretend i have the following 3 toy functions:

(defun new-+ (x y)
  (+ x y))

(defun fixnum-+ (x y)
  (declare (optimize (speed 3) (safety 0))
           (fixnum x y))
  (+ x y))

(defun string-+ (x y)
  (declare (optimize (speed 3) (safety 0))
           (string x y))
  (concatenate 'string x y))

As a purely toy example, lets say we wanted to tell the compiler that it could transform calls to my user defined function new-+ into calls to either fixnum-+ or string-+.

The condition for the compiler transforming (new-+ x y) into (fixnum-+ x y) would be knowing that the arguments x and y are both of type fixnum, and the conditions for transforming into (string-+ x y) would be knowing that the arguments x and y are both of type string.

So the questions:

  1. Can I actually do this?
  2. What are the actual mechanics of doing so and generating other user based transforms/extensions?
  3. Any reading or sources apart from manually reading through the source to discover more info regarding this?
  4. If i can't do this using the likes of deftransform, is there any other way I could do so?

Note: I'm aware of the operations and nature of macros and generic functions in general common lisp coding, and don't consider using them an answer to this question, since I'm specifically curious about extending the SBCL internals and interacting with its compiler.

like image 581
DJMelksham Avatar asked Jun 03 '17 08:06

DJMelksham


2 Answers

You achieve what you want in portable Common Lisp using define-compiler-macro

AFAIK reading the SBCL sources is the only way to learn how deftransform works. But before diving into SBCL sources checkout Paul Khuong's Starting to Hack on SBCL or at the very least The Python Compiler for CMU Common Lisp it links to to have an overview of how SBCL works.

like image 124
PuercoPop Avatar answered Nov 14 '22 02:11

PuercoPop


I now attempt to provide a broad overview that answers my questions and may point others towards constructively investigating similar directions.

  1. Can I actually do this?

Yes. Though depending on the specifics of how and why, you may have a choice of options available to you, and they may have variable levels of portability between Common Lisp implementations.

  1. What are the actual mechanics of doing so and generating other user based transforms/extensions?

I answer this with respect to two possible methods that the programmer may choose to get started, and which seem most applicable.

For both examples, I reiterate that with limited reflection on the topic, i think it bad form to transform relationships between the input/output mappings of a function. I do so here for demonstration purposes only, to verify that the transformations I'm implementing are actually taking place.

I actually had quite a difficult time testing my transformations were actually happening: SBCL especially seems quite happy to optimise certain expressions and forms, there are additional pieces of information you can make available to the compiler not covered here. Additionally, there may be other transformations available, and so just because your transform isn't used, doesn't necessarily mean it isn't "working".

Environments and Define-Compiler-Macro Extensions using Common Lisp the Language 2

I was previously under the impression that DEFINE-COMPILER-MACRO was relatively limited in its abilities, working only on types connected with literal values, but this is not necessarily the case.

To demonstrate this, i use three user-defined functions and a compiler macro.

First: We will begin with a general addition function gen+ that decides at run-time to either add two numbers together, or concatenate two strings:

(defun gen+ (x y)
  (if (and (numberp x)
           (numberp y))
      (+ x y)
      (concatenate 'string x y)))

But say we know at compile time that in certain instances, only strings will be fed to this function. Let's define our specialised string addition function, and to prove its actually being used, we'll do a very bad thing stylistically and additionally concatenate the string "kapow" as well:

(defun string+ (x y)
  (declare (optimize (speed 3) (safety 0))
           (string x y))
  (concatenate 'string x y "kapow"))

The following function is a very simple convenience function that checks an environment to establish whether the declared type of the variable bound in that environment is eq to STRING. We're using a NON-ANSI function here from Common Lisp the Language 2. In sbcl, the function VARIABLE-INFORMATION, and other cltl2 functions are available in the sb-ctlt2 package.

(defun env-stringp (symbol environment)
  (eq 'string
      (cdr (assoc 'type
                  (nth-value 2 (sb-cltl2:variable-information symbol environment))))))

Lastly, we use DEFINE-COMPILER-MACRO to generate the transformation. I've tried to name things in this code differently from other examples I've seen so that people can follow along and not get mixed up with what variable/symbol is in which scope/context. A couple of things I didn't know previously about DEFINE-COMPILER-MACRO.

  • The variable that immediately follows the &whole parameter is a variable which represents the form of the initial call. In our example it will be bound to the list (GEN+ A B)
  • arg1 is bound to the symbol A
  • arg2 is bound to the symbol B
  • The &environment parameter says that within this macro, the symbol ENV will be bound to the environment in which the macro is being evaluated. This is what lets us "kind of step back out of the macro" and check the surrounding code for declarations regarding the type of the variables represented by the symbols bound to 'ARG1' and 'ARG2'

In this definition, we tell the compiler macro that if the user has declared the parameters of GEN+ to be strings, then replace the call to (GEN+ ARG1 ARG2) with a call to (STRING+ ARG1 ARG2).

Note that because the condition of this transformation is the result of a user-defined operation on the environment, if the parameters to GEN+ are literal strings, the transformation will not be triggered, because the environment does not see that the variables have been declared strings. To do that, you would have to add another option and transformation to explicitly check the types of the values in ARG1 and ARG2 as per a traditional use of DEFINE-COMPILER-MACRO. This can be left as an exercise for the reader. But beware about the utility of doing so, because SBCL, for instance, might constant-fold your expression rather than use your transformation anyway.

    (define-compiler-macro gen+ (&whole form arg1 arg2 &environment env)
      (cond ((and (env-stringp arg1 env)
                  (env-stringp arg2 env))
             `(string+ ,arg1 ,arg2))
            (t form)))

Now we can test it with a simple call with type declarations:

(let ((a "bob")
      (b "dole"))
  (declare (string a b))
  (gen+ a b))

This should return the string "bobdolekapow" as the call to GEN+ was transformed into a call to STRING+ based on the declared types of the variables A and B, not just literal types.

Using Basic (defknown)/(deftransform) Combinations with the SBCL Implementation Compiler

The previous technique is indeed potentially useful, more powerful and flexible than transforming on the types of literals, and while not standard ANSI Common Lisp, is more portable/adaptable to other implementations than the technique that follows.

A reason you might forego the former technique in preference of the one that follows, is that the former doesn't get you everything. You still had to declare the types of the variables a and b and write the user-defined function to extract the declared type information from the environment.

If you can interact directly with the SBCL compiler however, with the cost of potentially some brittle-ness and extreme non-portability, you now gain the ability to hack into the compiler itself and gain the benefits of things like type propagation: you might not need to explicitly inform the compiler of the types of A and B for it to implement your transformation.

For our example, we will implement a very basic transformation on the functions wat and string-wat, which are identical in form to our previous functions gen+ and string+.

Understand there are many more pieces of information and optimisation you can feed the SBCL compiler not covered here. And if anyone more experienced with SBCL internals wants to correct/extent anything regarding my impressions here, please comment and i'll be happy to update my answer:

First we tell the compiler about the existence and type signature of wat. We do this by calling defknown in the sb-c package and inform it that wat takes two parameters of any type: (T T) and that it returns a single value of any type: *

(sb-c:defknown wat (T T) *)

Then we define a simple transform using sb-c:deftransform, essentially saying when the two parameters fed to wat are strings, we transform the code into a call to string-wat.

(sb-c:deftransform wat ((x y) (string string) *) 
  `(string-wat x y))

The forms of wat and string-wat for completeness:

(defun wat (x y)
  (if (and (numberp x)
           (numberp y))
      (+ x y)
      (concatenate 'string x y)))

(defun string-wat (x y)
  (declare (optimize (speed 3) (safety 0))
           (string x y))
  (concatenate 'string x y "watpow"))

And this time a demonstration in SBCL using bound variables but no explicit type declarations:

(let ((a (concatenate 'string "bo" "b"))
       (b (concatenate 'string "dole")))
   (wat a b))

And the returned string should be "bobdolewatpow".

  1. Any reading or sources apart from manually reading through the source to discover more info regarding this?

I haven't been able to find anything much about this out there, and would say that to get much deeper, you're going to have to start trawling through some source code.

SBCL github mirror is currently available here.

User @PuercoPop has suggested background reading of Starting to Hack on SBCL and The Python Compiler for CMU Common Lisp, albeit I am including a link to a .pdf version rather than a .ps version commonly linked to.

like image 36
DJMelksham Avatar answered Nov 14 '22 04:11

DJMelksham