Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ARM Unified Assembler Language grammar and parser?

Is there a publicly available grammar or parser for ARM's Unified Assembler Language as described in ARM Architecture Reference Manual A4.2

This document uses the ARM Unified Assembler Language (UAL). This assembly language syntax provides a canonical form for all ARM and Thumb instructions.

UAL describes the syntax for the mnemonic and the operands of each instruction.

Simply I'm interested in the code for parsing mnemonic and the operands of each instruction. For example how you could define a grammar for these lines?

ADC{S}{<c>}{<q>} {<Rd>,} <Rn>, <Rm>, <type> <Rs>
IT{<x>{<y>{<z>}}}{<q>} <firstcond>
LDC{L}<c> <coproc>, <CRd>, [<Rn>, #+/-<imm>]{!}
like image 866
auselen Avatar asked Oct 04 '22 00:10

auselen


1 Answers

If you need to create a simple parser based on an example-based grammar, nothing beats ANTLR:

http://www.antlr.org/

ANTLR translates a grammar specification into lexer and parser code. It's much more intuitive to use than Lexx and Yacc. The grammar below covers part of what you specified above, and it's fairly easy to extend to do what you want:

grammar armasm;

/* Rules */
program: (statement | NEWLINE) +;

statement: (ADC (reg ',')? reg ',' reg ',' reg
    | IT firstcond
    | LDC coproc ',' cpreg (',' reg ','  imm )? ('!')? ) NEWLINE;

reg: 'r' INT;
coproc: 'p' INT;
cpreg: 'cr' INT;
imm: '#' ('+' | '-')? INT;
firstcond: '?';

/* Tokens */
ADC: 'ADC' ('S')? ; 
IT:   'IT';
LDC:  'LDC' ('L')?;

INT: [0-9]+;
NEWLINE: '\r'? '\n';
WS: [ \t]+ -> skip;

From the ANTLR site (OSX instructions):

$ cd /usr/local/lib
$ wget http://antlr4.org/download/antlr-4.0-complete.jar
$ export CLASSPATH=".:/usr/local/lib/antlr-4.0-complete.jar:$CLASSPATH"
$ alias antlr4='java -jar /usr/local/lib/antlr-4.0-complete.jar'
$ alias grun='java org.antlr.v4.runtime.misc.TestRig'

Then on the grammar file run:

antlr4 armasm.g4
javac *.java
grun armasm program -tree

    ADCS r1, r2, r3
    IT ?
    LDC p3, cr2, r1, #3 
    <EOF>

This yields the parse tree broken down into tokens, rules, and data:

(program (statement ADCS (reg r 1) , (reg r 2) , (reg r 3) \n) (statement IT (firstcond ?) \n) (statement LDC (coproc p 3) (cpreg cr 2) (reg r 1) , (imm # - 3) ! \n))

The grammar doesn't yet include the instruction condition codes, nor the details for the IT instruction at all (I'm pressed for time). ANTLR generates a lexer and parser, and then the grun macro wraps them in a test rig so I can run text snippets through the generated code. The generated API is straightfoward to use in your own applications.

For completeness, I looked online for an existing grammar and didn't find one. Your best bet there might be to take apart gasm and extract its parser spec, but it won't be UAL syntax and it will be GPL if that matters to you. If you only need to handle a subset of the instructions then this is a good way to go.

like image 79
Joe P Avatar answered Oct 13 '22 11:10

Joe P