Is it possible to implement Common Lisp's macro system in scheme?

1 Answers

I'll try to be brief -- which is hard since this is generally a very deep issue, more than the average SO level of Q&A... so this is still going to be pretty long. I'll also try to be unbiased; even though I come from a Racket perspective, I have been using Common Lisp in the past, and I always liked to use macros in both worlds (and in others, actually). It won't look like a direct answer to your question (at an extreme end, that would be just "yes"), but it is comparing the two systems and hopefully this will help in clarifying the issue for people -- especially people outside of lisp (all flavors) who would wonder why is this such a big deal. I'll also describe how defmacro can be implemented in "syntax-case" systems, but only generally to keep things clear (and since you can just find such implementations, some given in the comments and in other answers).

First, your question is very much not redundant -- it is very justified, and (as I implied), one of the things that Lisp newcomers to Scheme and Scheme newcomers to Lisp encounter.

Second, the very shallow, very short answer is what people have told you: yes, it's possible to implement CL's defmacro in a Scheme that supports syntax-case, and as expected you've got multiple pointers to such implementations. Going the other way and implementing syntax-case using simple defmacro is a trickier subject that I won't talk about too much; I'll just say it has been done, only at a very high cost of re-implementing lambda and other binding constructs, which means that it's basically a re-implementation of a new language which you should commit to if you want to use that implementation.

And another clarification: people, especially CLers, very often collapse two things related to Scheme macros: hygiene, and syntax-rules. The thing is that in R5RS all you have is syntax-rules, which is a very limited pattern-based rewrite system. As with other rewrite systems, you can use it naively, or get all the way up to using rewrites to define a small language which you can then use to write macros in. See this text for a known explanation on how this is done. While it's possible to do that, the bottom line is that it's hard, you're using some weird small language thing that is not directly related to your actual language, and this makes it very far from Scheme programming -- probably in an even worse way that using the hygienic macro implementation in CL is not really using plain CL. In short, it's possible to use only syntax-rules, but this is mostly in a theoretical sense, not something that you'll want to use in "real" code. The main point here is that hygiene does not imply being limited to syntax-rules.

However, syntax-rules was not intended as "the" Scheme macro system -- the idea was always that you have some "low-level" macro implementation which is used to implement syntax-rules but can also implement hygiene-breaking macros -- it's just that there was no agreement on the particular low-level implementation. R6RS fixes that by standardizing the "syntax-case" macro system (note that I'm using "syntax-case" as the name of the system, different from syntax-case which is the form that is it's main highlight). As if to make a point that the discussion is still alive, R7RS took a step back and excluded it, reverting back to having syntax-rules with no commitment on a low-level system, at least as far as the "small language" goes.

Now, to actually understand the difference between the two systems, the best thing to clarify is the difference between the types that they're dealing with. With defmacro, a transformer is basically a function that takes in an S-expression(s) and returns an S-expression. An S-expression here is a type that is made of a bunch of literal types (numbers, strings, booleans), symbols, and list-nested structures of these. (The actual types used are a little more than that, but that's enough to make the point.) The thing is that this is a very simple world: you're getting in something that is very concrete -- you can actually print the input/output values and that's all you have. Note that this system uses symbols to denote identifiers -- and a symbol is something very concrete in just that sense: an x is a piece of code that has just that name, x.

However, this simplicity comes at a cost: you cannot use it for hygienic macros since you have no way to distinguish two different identifiers that are both called x. The usual CL-based defmacro does have some addtional bits that compensate for some of this. One such bit is gensym -- a tool to create "fresh" symbols that are uninterned and therefore are guaranteed to be different from any other symbol, including ones that have the same name. Another such bit is the &environment argument to defmacro transformers, which holds some representation of the lexical environment of the location where the macro is used.

It's obvious that these things complicate the defmacro world since it's no longer dealing with plain printable values, and since you need to be aware of some representation of an environment -- which makes it even more clear that a macro is actually a piece of code that is a compiler hook (since this environment is essentially some datatype that the compiler usually deals with, and one that is more complicated than just S-expressions). But as it turns out, they're not enough to implement hygiene. Using gensym you can get one easy aspect of hygiene knocked off (avoiding macro-side captures of user code), but the other aspect (avoiding user code capturing macro code) is still left open. Some people settle for this with arguing that the kind of capture that you can avoid is sufficient -- but when you're dealing with a modular system where a macro's environment often has different bindings than the ones used in its implementation, the other side becomes much more important.

Switch over to syntax-case macro systems (and happily skipping over syntax-rules, which is trivially implemented using syntax-case). In this system, the idea is that if plain symbolic S-expressions are not expressive enough to represent the complete lexical knowledge (ie, the difference between two different bindings, both called x), then we're going to "enrich" them and use a datatype that does. (Note that there are other low-level macro systems that take different approaches to providing the extra information, like explicit renaming and syntactic closures.)

The way that this is done is by making macro transformers be functions that consume and return "syntax objects", which are exactly that kind of a representation. More precisely, these syntax objects are usually built on top of the plain symbolic representation, only wrapped in structures that have additional information that represent the lexical scope. In some systems (notably in Racket), everything gets wrapped in syntax objects -- symbols as well as other literals and lists. Given this, it's not surprising that it's easy to get S-expressions out of syntax objects: you just pull out the symbolic contents, and if it's a list, then continue doing that recursively. In syntax-case systems, this is done by syntax-e which implements the accessor for the symbolic content of a syntax object, and syntax->datum that implements the versions that goes down the result recursively to produce a full S-expression. As a side-note, this is a rough explanation why in Scheme people don't talk about bindings being represented as symbols, but as identifiers.

On the other side, the question is how do you start from a given symbolic name and construct such a syntax object. The way that this is done is with the datum->syntax function -- but instead of making the api specify how is the lexical scope information represented, the function takes in a syntax object as a first argument and a symbolic S-expression as the second argument, and it creates a syntax object by properly wrapping the S-expression with the lexical scope information taken from the first. This means that to break hygiene what you usually do is start with a user-supplied syntax object (eg, a body form of a macro), and use its lexical information to create some new identifier like this that is visible in the same scope.

This quick description is enough to see how the macros that you were shown work. The macro that @ChrisJester-Young showed simply takes in the syntax object(s), strips it to a raw S-expression with syntax->datum, sends that to the defmacro transformer and gets back an S-expression, then it uses syntax->datum to convert the result back to a syntax object using the user code's lexical context. Racket's defmacro implementation is a bit fancier: during the stripping stage it keeps a hash table that maps the resulting S-expressions to their original syntax objects, and during the reconstruction step it consults this table to get the same context as the bits of code originally had. This makes it a more robust implementation for some more complex macros, but it's also more useful in Racket since syntax objects have much more information in them, like source location, properties, etc, and this careful reconstruction will usually result in output values (syntax objects) that keep the information they had on their way into the macro.

For a slightly more technical introduction for defmacro programmers to the syntax-case system, see my writing syntax-case macros blog post. If you're coming from the Scheme side it won't be as useful, but it will still be useful in clarifying the whole issue.

To get this closer to a conclusion, I should note that dealing with unhygienic macros can still be tricky. More specifically, there are various ways to achieve such bindings, but they are different in various subtle ways, and can usually come back and bite you leaving slightly different teeth marks in each case. In a "true" defmacro system like CL, you learn to live with a specific set of teeth marks, ones that are relatively well known, and therefore there are things that you just don't do. Most notably here is the kind of modular composition of languages with different bindings for the same names that Racket uses so frequently. In syntax-case systems a better approach is fluid-let-syntax that is used to "adjust" the meaning of a lexically scoped name -- and more recently, that has evolved into "syntax parameters". There is a good overview of the problems of hygiene-breaking macros, which includes a description of how you can attempt to solve it with just hygienic syntax-rules, with basic syntax-case, with CL-style defmacro, and finally with syntax parameters. This text gets a bit more technical, but its relatively easy to read the first few pages, and if you understand this, then you will have a very good picture of the whole debate. (There is also an older blog post that is covered better in the paper.)

I should also mention that this is far from being the only "hot" issue around macros. The debate within Scheme circles about which low-level macro system is better can get pretty hot at times. And there are other issues around macros like the question of how to make them work in a module system where a library can provide macros as well as values and functions, or whether to separate macro-expansion time and runtime into separate phases, and more.

Hopefully this presents a more complete picture of the issue, to the point of being aware of the tradeoffs and being able to decide for yourself what works best for you. I also hope that this clarifies some of the sources for the usual flames: hygienic macros are certainly not useless, but since the new type is more than just simple S-expressions, there is more functionality around them -- and all too often shallow-reading bypassers jump to a conclusion that "it's too complex". Even worse are flames in the spirit of "in the Scheme world people know close to nothing about metaprogramming": being very painfully aware of the added cost and also of the desired benefits, people in the Scheme world has spent orders of magnitude more collective efforts on the subject. It's a fine choice to stick with defmacro if the extra wrappers around S-expressions are too complicated for your taste, but you should be aware of the cost involved in learning that vs what you pay by dumping hygiene (and vs what you get by embracing it).

Unfortunately, macros of any flavor are overall a pretty hard subject for newbies (perhaps excluding the extremely limited syntax-rules), so people tend to find themselves in the middle of such flames without having enough experience to know your left from your right. Ultimately, nothing beats having good experience in both worlds to clarify the tradeoffs. (And that's from very concrete personal experience: had PLT Scheme not switch to syntax case N years ago, I would have probably never bother with it... Once they did switch, it took me a long while to convert my code -- and only then did I realize just how great it is to have a robust system where no names get "confused" by mistake (which would result in weird bugs, and obfuscated %%__names__).)

(Still, it's very likely that comment flames will happen...)

168

answered Oct 15 '22 04:10

Eli Barzilay

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to implement Common Lisp's macro system in scheme?

Tags:

category

People also ask

1 Answers

Eli Barzilay

Recent Activity

Donate For Us

Is it possible to implement Common Lisp's macro system in scheme?

Tags:

category

People also ask

1 Answers

Eli Barzilay

Related questions

Recent Activity

Donate For Us