Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The purpose of Lisp syntax to model AST

Tags:

lisp

Lisp syntax represents AST as far as I know, but in high level format to allow human to easily read and modify, at the same time make it easy for the machine to process the source code as well.

For this reason, in Lisp, it is said that code is data and data is code, since code (s-epxression) is just AST, in essence. We can plug in more ASTs (which is our data, which is just lisp code) into other ASTs (lisp code) or independently to extend its functionality and manipulate it on the fly (runtime) without having to recompile the whole OS to integrate new code.In other languages, we have to recompile from to turn the human-language source code into valid AST before it is compiled into code.

Is this the reason for Lisp syntax to be designed like it is (represent an AST but is human readable, to satisfy both human and the machine) in the first place? To enable stronger (on the fly - runtime) as well as simpler (no recompile, faster) communication between man-machine?

I heard that the Lisp machine only has a single address space which holds all data. In operating system like Linux, the programmers only have virtual address space and pretend it to be the real physical address space and can do whatever they want. Data and code in Linux are separated regions, because effectively, data is data and data is code. In normal OS written in C (or C like language), it would be very messy if we only operate a single address space for the whole system and mixing data with code would be very messy.

In Lisp Machine, since code is data and data is code, is this the reason for this to have only a single address space (without the virtual layer)? Since we have GC and no pointer, should it be safe to operate on physical memory without breaking it (since having only 1 single space is a lot less complicated)?

EDIT: I ask this because it is said that one of the advantage of Lisp is single address space:

A safe language means a reliable environment without the need to separate tasks out into their own separate memory spaces.

The "clearly separated process" model characteristic of Unix has potent merits when dealing with software that might be unreliable to the point of being unsafe, as is the case with code written in C or C++ , where an invalid pointer access can "take down the system." MS-DOS and its heirs are very unreliable in that sense, where just about any program bug can take the whole system down; "Blue Screen of Death" and the likes.

If the whole system is constructed and coded in Lisp, the system is as reliable as the Lisp environment. Typically this is quite safe, as once you get to the standards-compliant layers, they are quite reliable, and don't offer direct pointer access that would allow the system to self-destruct.

Third Law of Sane Personal Computing

Volatile storage devices (i.e. RAM) shall serve exclusively as read/write cache for non-volatile storage devices. From the perspective of all software except for the operating system, the machine must present a single address space which can be considered non-volatile. No computer system obeys this law which takes longer to fully recover its state from a disruption of its power source than an electric lamp would.

Single address space, as it is stated, holds all the running processes in the same memory space. I am just curious why people insist that single address space is better. I relate it to the AST like syntax of Lisp, to try to explain how it fits the single space model.

like image 804
Amumu Avatar asked Jul 04 '12 05:07

Amumu


1 Answers

Your question doesn't reflect reality very accurately, especially in the part about code/data separation in Linux and other OS'es. Actually, this separation is enforced not at the OS level, but by the compiler/program loader. At the OS level there are just memory pages that can have different protection bits set (like executable, read-only etc), and above this level different executable formats exist (like ELF in Linux) that specify restrictions on different parts of program memory.

Returning to Lisp, as far as I know, historically, the S-expression format was used by Lisp creators, because they wanted to concentrate on the semantics of the language, putting syntax aside for some time. There was a plan to eventually create some syntax for Lisp (see M-expressions), and there were some Lisp-based languages which had some more syntax, like Dylan. But, overall, the Lisp community had come to the consensus, that the benefits of S-expressions outweight their cons, so they had stuck.

Regarding code as data, this is not strictly bound to S-expressions, as other code can as well be treated as data. This whole approach is called meta-programming and is supported at different levels and with different mechanisms by many languages. Every language, that supports eval (Perl, JavaScript, Python) allows to treat code as data, just the representation is almost always a string, while in Lisp it is a tree, which is much much more convenient and facilitates advanced stuff, like macros.

like image 179
Vsevolod Dyomkin Avatar answered Oct 12 '22 09:10

Vsevolod Dyomkin