Design Pattern For Making An Assembler

Tags:

I'm making an 8051 assembler.

Before everything is a tokenizer which reads next tokens, sets error flags, recognizes EOF, etc.
Then there is the main loop of the compiler, which reads next tokens and check for valid mnemonics:

mnemonic= NextToken();
if (mnemonic.Error)
{
    //throw some error
}
else if (mnemonic.Text == "ADD")
{
    ...
}
else if (mnemonic.Text == "ADDC")
{
    ...
}

And it continues to several cases. Worse than that is the code inside each case, which checks for valid parameters then converts it to compiled code. Right now it looks like this:

if (mnemonic.Text == "MOV")
{
    arg1 = NextToken();
    if (arg1.Error) { /* throw error */ break; }
    arg2 = NextToken();
    if (arg2.Error) { /* throw error */ break; }

    if (arg1.Text == "A")
    {
        if (arg2.Text == "B")
            output << 0x1234; //Example compiled code
        else if (arg2.Text == "@B")
            output << 0x5678; //Example compiled code
        else
            /* throw "Invalid parameters" */
    }
    else if (arg1.Text == "B")
    {
        if (arg2.Text == "A")
            output << 0x9ABC; //Example compiled code
        else if (arg2.Text == "@A")
            output << 0x0DEF; //Example compiled code
        else
            /* throw "Invalid parameters" */
    }
}

For each of the mnemonics I have to check for valid parameters then create the correct compiled code. Very similar codes for checking the valid parameters for each mnemonic repeat in each case.

So is there a design pattern for improving this code?
Or simply a simpler way to implement this?

Edit: I accepted plinth's answer, thanks to him. Still if you have ideas on this, i will be happy to learn them. Thanks all.

323

asked Apr 07 '11 19:04

Hossein

1 Answers

I've written a number of assemblers over the years doing hand parsing and frankly, you're probably better off using a grammar language and a parser generator.

Here's why - a typical assembly line will probably look something like this:

[label:] [instruction|directive][newline]

and an instruction will be:

plain-mnemonic|mnemonic-withargs

and a directive will be:

plain-directive|directive-withargs

etc.

With a decent parser generator like Gold, you should be able to knock out a grammar for 8051 in a few hours. The advantage to this over hand parsing is that you will be able to have complicated enough expressions in your assembly code like:

.define kMagicNumber 0xdeadbeef
CMPA #(2 * kMagicNumber + 1)

which can be a real bear to do by hand.

If you want to do it by hand, make a table of all your mnemonics which will also include the various allowable addressing modes that they support and for each addressing mode, the number of bytes that each variant will take and the opcode for it. Something like this:

enum {
    Implied = 1, Direct = 2, Extended = 4, Indexed = 8 // etc
} AddressingMode; 

/* for a 4 char mnemonic, this struct will be 5 bytes.  A typical small processor
 * has on the order of 100 instructions, making this table come in at ~500 bytes when all
 * is said and done.
 * The time to binary search that will be, worst case 8 compares on the mnemonic.
 * I claim that I/O will take way more time than look up.
 * You will also need a table and/or a routine that given a mnemonic and addressing mode
 * will give you the actual opcode.
 */

struct InstructionInfo {
    char Mnemonic[4];
    char AddessingMode;
}

/* order them by mnemonic */
static InstructionInfo instrs[] = {
    { {'A', 'D', 'D', '\0'}, Direct|Extended|Indexed },
    { {'A', 'D', 'D', 'A'}, Direct|Extended|Indexed },
    { {'S', 'U', 'B', '\0'}, Direct|Extended|Indexed },
    { {'S', 'U', 'B', 'A'}, Direct|Extended|Indexed }
}; /* etc */

static int nInstrs = sizeof(instrs)/sizeof(InstrcutionInfo);

InstructionInfo *GetInstruction(char *mnemonic) {
   /* binary search for mnemonic */
}

int InstructionSize(AddressingMode mode)
{
    switch (mode) {
    case Inplied: return 1;
    / * etc */
    }
 }

Then you will have a list of every instruction which in turn contains a list of all the addressing modes.

So your parser becomes something like this:

char *line = ReadLine();
int nextStart = 0;
int labelLen;
char *label = GetLabel(line, &labelLen, nextStart, &nextStart); // may be empty
int mnemonicLen;
char *mnemonic = GetMnemonic(line, &mnemonicLen, nextStart, &nextStart); // may be empty
if (IsOpcode(mnemonic, mnemonicLen)) {
    AddressingModeInfo info = GetAddressingModeInfo(line, nextStart, &nextStart);
    if (IsValidInstruction(mnemonic, info)) {
        GenerateCode(mnemonic, info);
    }
    else throw new BadInstructionException(mnemonic, info);
}
else if (IsDirective()) { /* etc. */ }

167

answered Oct 30 '22 06:10

plinth

Related questions
                            
                                IDE / setup for package development with C++ code integrated
                            
                                it is possible to change return type when override a virtual function in C++?
                            
                                Do temporary objects have scope?
                            
                                Poor performance with DrawText on Win7 x64
                            
                                Files in folders not found in iOS app using C++
                            
                                Reducing template complexity in C++
                            
                                What if the system time changes while I'm doing timed_wait with a duration?
                            
                                Is it possible to do the following with auto in C++0x?
                            
                                What libraries do I need to use std::placeholders?
                            
                                Returning const 'this' pointer
                            
                                Chess piece hierarchy design: inheritance vs type fields
                            
                                How to avoid multiple definition linking error?
                            
                                CodeBlocks, GCC: change project language c and c++?
                            
                                SQLite escape string c++
                            
                                shared_ptr and the this-pointer
                            
                                Want to implement a VPN for just one application
                            
                                class exporting error (error C2470: looks like a function definition)
                            
                                Windows SDK parameter annotations
                            
                                Boost error conversion from ‘boost::filesystem3::path’ to non-scalar type ‘std::string
                            
                                C++ - Output all possible DNA kmers of a given length

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Design Pattern For Making An Assembler

Tags:

c++

design-patterns

assembly

compiler-construction

Hossein

People also ask

1 Answers

plinth

Recent Activity

Donate For Us