Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Common Lisp parser that preserves comments

Tags:

common-lisp

Common Lisp comes with a parser (reader) that converts its textual syntax to s-expressions. However, it discards comments, making it unsuitable for tools that round-trip Lisp code.

Is there an existing parser for Common Lisp, that preserves comments?

like image 724
rwallace Avatar asked Dec 24 '22 04:12

rwallace


2 Answers

As noted in a comment, you should be able to modify the readtable to bind the macro character ; to a different reader macro function. For example, if you define:

(defun semicolon-reader (stream char)
  (list 'my-comment
        (concatenate 'string (string char)
                     (read-line stream nil #\Newline t))))

and then at toplevel run:

> (set-macro-character #\; #'semicolon-reader)
> (read)

the user input:

(a b ; is b
c ; is c
)

will generate:

(A B (MY-COMMENT "; is b") C (MY-COMMENT "; is c"))

However, real round trip processing is also going to require you to preserve whitespace. I don't know enough about the reader to know if you could get away with defining some clever macro functions for whitespace characters, or if you'd have to write some kind of preprocessing function to pre-quote runs of whitespace with another macro character and then handle it similarly to semicolon-reader above.

like image 71
K. A. Buhr Avatar answered Feb 15 '23 10:02

K. A. Buhr


Since you're making a code formatter, you might be able to get away with supporting only standard Common Lisp syntax (with-standard-io-syntax) and modifying the standard readtable to preserve comments and whitespace (set-macro-character). The readtable is the main data structure of the CL reader and tells it which function to call to read different kinds of objects when it encounters a particular character in source code (e.g. how to read a list when it encounters an opening parenthesis).

You have to use either gensyms or structs/classes to represent comments and whitespace, since other kinds of objects (e.g. lists and non-gensym symbols) can be read in from a source file by the Lisp reader using standard IO syntax.

Below is a quick proof of concept. The reader works fine but I couldn't get the printer to work (i.e. re-print the stuff we got from the reader to produce a source file close enough to the input) - it prints extra whitespace around our whitespace, probably because it thinks our whitespace objects are like normal Lisp objects (symbols, lists, etc.) and should be delimited by whitespace when you print several of them in a row (e.g. if you print 1 and 2 and 3 it should print 1 2 3 not 123). Diving into the guts of the Common Lisp printer to figure out how to override this behavior is left as an exercise for the reader :p

Also, peruse section 2.4, Standard Macro Characters of the Common Lisp HyperSpec. Section 2.4.8 Sharpsign lists all the syntax that starts with #. Beware especially of #+ and #- and #.

If you ever get this to work well on real code, please consider publishing it as an open-source package.

(defstruct comment style string)
(defstruct whitespace string)

(defconstant +whitespace-chars+ '(#\Space #\Tab #\Return #\Newline))
(defconstant +eof+ (gensym "EOF"))

(defun read-semicolon-comment (stream semicolon)
  (declare (ignore semicolon))
  (make-comment :style :semicolon :string
    (with-output-to-string (comment)
      (loop (let ((char (read-char stream nil +eof+ t)))
              (cond ((equal char +eof+) (return))
                ((equal char #\Newline)
                  (unread-char char stream)
                  (return))
                (t (write-char char comment))))))))

(defun read-whitespace (stream first-char)
  (make-whitespace :string
    (with-output-to-string (whitespace)
      (write-char first-char whitespace)
      (loop (let ((char (read-char stream nil +eof+ t)))
              (unless (member char +whitespace-chars+)
                (unless (equal char +eof+) (unread-char char stream))
                (return))
              (write-char char whitespace))))))

(defun read-stream (stream)
  (with-standard-io-syntax ; Here's a comment, for example.
    (let ((*readtable* (copy-readtable)))
      (set-macro-character #\; #'read-semicolon-comment)
      (dolist (char +whitespace-chars+)
        (set-macro-character char #'read-whitespace))
      (loop for x = (read stream nil +eof+) until (equal x +eof+)
        collect x))))

(defmethod print-object ((x comment) stream)
  (assert (equal :semicolon (comment-style x)))
  (write-char #\; stream)
  (write-string (comment-string x) stream)
  x)

(defmethod print-object ((x whitespace) stream)
  (write-string (whitespace-string x) stream)
  x)

(mapc #'prin1 (read-stream *standard-input*))
like image 29
Lassi Avatar answered Feb 15 '23 10:02

Lassi