Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correct way of parsing S-expressions in OOP

I am looking for a way to implement an S-expression reader (to be used later on with both an Scheme interpreter and a compiler), but I've been asking myself how (if at all) I should write an AST for it.

I've been reading SICP, and this is quite straightforward from within Scheme, but I'm looking to implement the interpreter and compiler in C++, in OO fashion.

Please, keep in mind that I'm doing this only for learning purposes, so I'm not really looking for the easiest or quickest way of doing this, but rather the correct and reusable way of doing it.

I've seen in some Scheme implementations that people parse s-expressions and readily output cons cells, something like this:

  struct Sexpr
  {
  };

  struct Cons : public Sexpr
  {
    Sexpr* left;
    Sexpr* right;
  };

  struct IntAtom : Sexpr
  {
    int value;
  };

And one subclass of Sexpr for each kind of Scheme Atom, or something along those lines.

I'm not sure, but this seems like a hack to me... Shouldn't this work be done by an interpreter rather than the reader?

What I want to know is if this is considered the best (or correct) way of reading S-expressions, or is this more a job of the interpreter than the parser? Should the parser have its own AST instead of relying on cons cells?

like image 461
ivanmp Avatar asked Feb 24 '12 17:02

ivanmp


1 Answers

To follow up from the Scheme/Racket side of the fence:

Racket (and some other Scheme implementations) use a richer representation for syntax objects, so that they can have properties attached to them indicating (in Racket, at least) what context they're bound in, what source location they come from, what pass of the compiler inserted them, and any other information you might want to store (cf. "syntax properties" in Racket).

This additional information enables things like error messages with pointers to source, and hygienic macros.

Please note that I mean "richer" here simply in the "contains more values" sense, not in any non-value-neutral way.

I should also add---before falling into the Turing Tar Pit---that you can also represent this exact same information using a table on the side; assuming you have pointer comparisons, there's no expressiveness difference between putting a value inside a structure and using a table to associate the structure with the value.

like image 143
John Clements Avatar answered Sep 23 '22 17:09

John Clements