Can someone suggest articles that explain the concept of Homoiconicity, especially using Clojure. Why is it that Clojure is homoiconic but its hard to do that in other languages such as Java ?
Languages such as Lisp and its dialects, such as Scheme, Clojure, and Racket employ S-expressions to achieve homoiconicity.
C is not homoiconic, whether the compiler is implemented in C or in Pascal, and C cannot be made homoiconic without adding fundamentally new and different constructs to the language definition.
Forth code is stored in characters, in the form of source; Forth is fully homoiconic when it's used to manipulate its source.
Racket is a homoiconic language which means that everything in the language is represented the same way. All code and data are represented with S-expressions.
Before I proceed with some things I wanted to add another answer for, here's one more reference -- the part related to homoiconicity is fairly short, but it is Rich Hickey doing the explaining! Channel 9 has this nice video with Rich Hickey and Brian Beckman talking about Clojure. Concurrency is, understandably, the major focus, but homoiconicity does get its own (short) moment of screen time during which Rich nicely explains the interplay between read
(the function which converts concrete syntax as written down by the programmer to the internal representation built out from lists etc.) and eval
. He has this nice diagram showing how eval
never even knows that the code it evaluates comes from read
operating on a text file... Arthur has already explained the gist behind that, but hey, watch it anyway, it's a very nice video!
A disclaimer: I'll be mentioning Java and Python as examples below the next horizontal bar. I want to make clear that the following is just a rough sketch of why I think it might be difficult to make a homoiconic, Lisp-style-macro-enabled Java or Python; it's just an academic exercise, though, and I don't want to consider the question of whether there's any reason to try in the first place. Also, I don't want to imply that the syntax of a language with Lisp style macros must contain explicit delimiters for tree structures; Dylan (the paren-less Lisp?) apparently provides a counterexample. Finally, I use the expression Lisp style macros because I'm only examining Lisp style macros. The language Forth, for example, has a different macro facility which I don't really understand except that I know it to enable wicked cool looking code. Apparently syntax extensions can be implemented in a number of ways. With this out of the way...
I'd like to address the second part of your question -- how is it that most programming languages are considered not to be homoiconic? I'll have to touch upon the semantics of Lisp in the process, but since Nils has already provided links to good sources of information on the term "homoiconic" itself and Arthur has described the read -> macro expand -> compile cycle as found in Clojure, I'll be building on that in what follows. To start things off, let me quote a passage from Alan Kay (extracted from the Wikipedia article which also links to the original source):
[...] Interactive LISP [...] and TRAC [...] both are "homoiconic" in that their internal and external representations are essentially the same.
(Those [...] bits hide a lot of text, but the gist is unchanged.)
Now, let's ask ourselves the question: what is Java's internal representation of Java? ... Well, this doesn't even make sense. The Java compiler does have a certain internal representation of Java, namely an abstract syntax tree; to construct a "homoiconic Java", we'd have to make that AST representation a first-class object in Java and devise a syntax which would allow us to write ASTs directly. That could prove to be rather hard.
Python provides an example of a non-homoiconic language which is interesting in that it currently ships with an AST-manipulation toolkit in the form of the ast
module. The docs for that module explicitly state that Python ASTs may change between releases, which may or may not be discouraging; still, I suppose an industrious programmer could take the ast
module, devise a syntax (maybe S-expression based, maybe XML-based) for describing Python ASTs directly and construct a parser for that syntax in regular Python using ast
, thus taking a solid first step towards creating a homoiconic language with Python semantics. (I believe I came across a dialect of Lisp compiling to Python bytecode some time ago... I wonder if it might be doing something like that at some level?)
Even then the problem remains of extracting concrete benefits from that kind of homoiconicity. It's viewed as a beneficial property of members of the Lisp family of languages because it allows us to write programmes which write further programmes, among which macros are the most notable. Now, while macros are enabled in one way by the fact that it is so easy to manipulate the internal representation of Lisp code in Lisp, they are also enabled in an equally important way by the Lisp execution model: a Lisp programme is just a collection of Lisp forms; these are processed by the Lisp function eval
which is responsible for determining the values of the expressions and causing the appropriate side-effects at the correct time; the semantics of Lisp are exactly the semantics of eval
. The question of how things work internally to preserve this semantic illusion while being reasonably fast is an implementation detail; a Lisp system has an obligation to expose a function eval
to the programmer and to act as if Lisp programmes were being processed by that function.
In modern Lisp systems, it is a part of eval
's contract that it performs an additional preprocessing phase during which macros are expanded prior to evaluating the code (or compiling and running, as the case may be). That particular facility is not a necessary part of a Lisp system, but it is just so easy to plug it into this execution model! Also, I wonder if this isn't the only execution model which makes the Lisp kind of macro transformations manageable, which would mean that any language seeking to incorporate Lisp-style macros would have to adopt a similar execution model. My intuition tells me that this is indeed the case.
Of course once a language is written down in notation directly paralleling its ASTs and uses a Lisp-like execution model with an evaluator function / object, one must wonder if it isn't by any chance another dialect of Lisp... even if its AST-paralleling syntax happens to be XML-based. shudder
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With