Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Questions about AT&T x86 Syntax design

Tags:

  1. Can anyone explain to me why every constant in AT&T syntax has a '$' in front of it?
  2. Why do all registers have a '%'?
  3. Is this just another attempt to get me to do a lot of lame typing?
  4. Also, am I the only one that finds: 16(%esp) really counterintuitive compared to [esp+16]?
  5. I know it compiles to the same thing but why would anyone want to type a lot of '$' and '%'s without a need to? - Why did GNU choose this syntax as the default?
  6. Another thing, why is every instruction in at&t syntax preceded by an: l? - I do know its for the operand sizes, however why not just let the assembler figure that out? (would I ever want to do a movl on operands that are not that size?)
  7. Last thing: why are the mov arguments inverted?

Isn't it more logical that:

eax = 5 mov eax, 5 

where as at&t is:

mov 5, eax 5 = a (? wait what ?) 

Note: I'm not trying to troll. I just don't understand the design choices they made and I'm trying to get to know why they did what they did.

like image 300
Skeen Avatar asked Nov 16 '10 11:11

Skeen


2 Answers

1, 2, 3 and 5: the notation is somewhat redundant, but I find it to be a good thing when developing in assembly. Redundancy helps reading. The point about "let the assembler figure it out" easily turns into "let the programmer who reads the code figure it out", and I do not like it when I am the one doing the reading. Programming is not a write-only task; even the programmer himself must read his own code, and the syntax redundancy helps quite a bit.

Another point is that the '%' and '$' mean that new registers can be added without breaking backward compatibility: no problem in adding, e.g., a register called xmm4, as it will be written out as %xmm4, which cannot be confused with a variable called xmm4 which would be written without a '%'.

As for the amount of typing: normally, when programming in assembly, the bottleneck is the brain, not the hand. If the '$' and '%' slow you down, then either you are thinking way faster than what is usually considered as doable for a human being, or, more probably, your task at hand is too mechanical and should not be done in assembly; it should be left to an automatic code generator, something colloquially known as a "C compiler".

The 'l' suffix was added to handle some situations where the assembler "cannot" figure it out. For instance, this code:

mov  [esp], 10 

is ambiguous, because it does not tell whether you want to write a byte of value 10, or a 32-bit word with the same numerical value. The Intel syntax then calls for:

mov  byte ptr [esp], 10 

which is quite ugly, when you think about it. The people at AT&T wanted to make something more rational, so they came up with:

movb   $10, (%esp) 

and they preferred to be systematic, and have the 'b' (or 'l' or 'w') suffix everywhere. Note that the suffix is not always required. For instance, you can write:

mov   %al, (%ebx) 

and let the GNU assembler "figure out" that since you are talking about '%al', the move is for a single byte. It really works ! Yet, I still find it better to specify the size (it really helps the reader, and the programmer himself is the first and foremost reader of his own code).

For the "inversion": it is the other way round. The Intel syntax mimics what occurs in C, in which values are computed on the right, then written to what is on the left. Thus, the writing goes right to left, in the "reverse" direction, considering that reading goes left-to-right. The AT&T syntax reverts to the "normal" direction. At least so they considered; since they were decided about using their own syntax anyway, they thought that they could use the operands in what they thought of as "the right ordering". This is mostly a convention, but not an illogical one. The C convention mimics mathematical notation, except that mathematics are about defining values ("let x be the value 5") and not about assigning values ("we write the value 5 into a slot called 'x'"). The AT&T choice makes sense. It is confusing only when you are converting C code to assembly, a task which should usually be left to a C compiler.

The last part of your question 5 is interesting, from an historical point of view. The GNU tools for x86 followed the AT&T syntax because at that time, they were trying to take hold in the Unix world ("GNU" means "GNU is Not Unix") and competing with the Unix tools; Unix was under control of AT&T. This is before the days of Linux or even Windows 3.0; PC were 16-bit systems. Unix used the AT&T syntax, hence GNU used AT&T syntax.

The good question is then: why did AT&T found it smart to invent their own syntax ? As described above, they had some reasons, which were not without merit. The cost of using your own syntax, of course, is that it limits interoperability. In those days, a C compiler or assembler made no real sense as a separate tool: in a Unix system, they were meant to be provided by the OS vendor. Also, Intel was not a big player in the Unix world; big systems mostly used VAX or Motorola 680x0 derivatives. Nobody had figured out that the MS-Dos PC would turn into, twenty years later, the dominant architecture in the desktop and server worlds.

like image 193
Thomas Pornin Avatar answered Sep 28 '22 05:09

Thomas Pornin


1-2, 5: They probably chose to prefix registers and such to make it easier to parse; you know directly at the first character what kind of token it is.

4: No.

6: Again, probably to make it easier for the parser to figure out what instruction to output.

7: Actually this makes more sense in a grammatical meaning, move what to where. Perhaps the mov instruction should be an ld instruction.

Don't get me wrong, I think AT&T syntax is horrible.

like image 45
Jens Björnhager Avatar answered Sep 28 '22 04:09

Jens Björnhager