Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing a Z80 assembler - lexing ASM and building a parse tree using composition?

I'm very new to the concept of writing an assembler and even after reading a great deal of material, I'm still having difficulties wrapping my head around a couple of concepts.

  1. What is the process to actually break up a source file into tokens? I believe this process is called lexing, and I've searched high and low for a real code examples that make sense, but I can't find a thing so simple code examples very welcome ;)

  2. When parsing, does information ever need to be passed up or down the tree? The reason I ask is as follows, take:

    LD BC, nn

It needs to be turned into the following parse tree once tokenized(???)

  ___ LD ___
  |        |
 BC        nn

Now, when this tree is traversed it needs to produce the following machine code:

01 n n

If the instruction had been:

LD DE,nn

Then the output would need to be:

11 n n

Meaning that it raises the question, does the LD node return something different based on the operand or is it the operand that returns something? And how is this achieved? More simple code examples would be excellent if time permits.

I'm most interested in learning some of the raw processes here rather than looking at advanced existing tools so please bear that in mind before sending me to Yacc or Flex.

like image 294
Gary Paluk Avatar asked Aug 20 '09 09:08

Gary Paluk


1 Answers

Well, the structure of the tree you really want for an instruction that operates on a register and an memory addressing mode involing an offset displacement and an index register would look like this:

    INSTRUCTION-----+
    |      |        |
  OPCODE  REG     OPERAND
                  |     |
                OFFSET  INDEXREG

And yes, you want want to pass values up and down the tree. A method for formally specifying such value passing is called "attribute grammars", and you decorate the grammar for your langauge (in your case, your assembler syntax) with the value-passing and the computations over those values. For more background, see Wikipedia on attribute grammars.

In a related question you asked, I discussed a tool, DMS, which handles expression grammars and building trees. As language manipulation tool, DMS faces exactly these same up-and-down the tree information flows issues. It shouldn't surprise you, that as a high-end language manipulation tool, it can handle attribute grammar computations directly.

like image 159
Ira Baxter Avatar answered Sep 29 '22 03:09

Ira Baxter