Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to assemble the output of an disassembler (such as objdump) [duplicate]

Possible Duplicate:
Disassembling, modifying and then reassembling a Linux executable

I've been told that assembly and dissassembly are not inverses. Apparently, you can't dissassemble a program, put that output directly into an assembler, and expect it to run correctly because information is lost.

My question is, why is information lost? Also, what information is lost?

like image 690
matzahboy Avatar asked Dec 14 '11 18:12

matzahboy


People also ask

What is objdump output?

objdump displays information about one or more object files. The options control what particular information to display. This information is mostly useful to programmers who are working on the compilation tools, as opposed to programmers who just want their program to compile and work.

What is objdump in GCC?

objdump is a command-line program for displaying various information about object files on Unix-like operating systems. For instance, it can be used as a disassembler to view an executable in assembly form. It is part of the GNU Binutils for fine-grained control over executables and other binary data.

What do you mean by assembler and disassembler?

A disassembler is a computer program that translates machine language into assembly language—the inverse operation to that of an assembler. A disassembler differs from a decompiler, which targets a high-level language rather than an assembly language.


2 Answers

One important thing that disassemblers (or their users) routinely do not preserve is the instruction encoding. Some instructions can be encoded in multiple different ways, e.g.:

mov rdx, -1 is either 48,BA,FF,FF,FF,FF,FF,FF,FF,FF (10 bytes) or 48,C7,C2,FF,FF,FF,FF (7 bytes).

If the rest of the program somehow functionally depends on the length of the above instruction being exactly 10 (or 7) bytes or on those specific byte values and the assembler chooses to assemble mov rdx, -1 differently from what it was in the original program, then after disassembly+assembly you get a different program that will work differently. For instructions with ambiguous encoding the assembler must use not the instruction mnemonic (mov rdx, -1) but its exact encoding in the disassembly of the original program (e.g. 48,BA,FF,FF,FF,FF,FF,FF,FF,FF).

There may be other things that the assembler or linker may do differently (e.g. do additional aligning of code/data, name and order things (sections/segments) differently in the output file), which usually aren't a problem, but, again, if there're some unusual dependencies on these things in the original program, then, the reassembled program will work differently.

like image 117
Alexey Frunze Avatar answered Nov 15 '22 11:11

Alexey Frunze


Its not a loss it is actually a gain. it sounds like you have not tried this yet, why not try it?

.global reset
reset:

  mov #0x0280,r1
  call #notmain
  jmp hang

.global hang
hang:
  jmp hang

which you can assemble looks like this with objdump:

0000f800 <reset>:
    f800:   31 40 80 02     mov #640,   r1  ;#0x0280
    f804:   b0 12 b2 f8     call    #0xf8b2 
    f808:   00 3c           jmp $+2         ;abs 0xf80a

0000f80a <hang>:
    f80a:   ff 3f           jmp $+0         ;abs 0xf80a

you can see the core code is still there and if you have a text editor with column or some other rectangle cut and paste you can cut that code out of the middle and either directly or with a little massaging re-assemble it.

There is no reason why you could not have a disassembler that generates output that can be re-assembled, I have done it many times and seen it many times. The thing is with a disassembler, the use case is to see that extra information. A use case for a disassembler that can re-assemble is for like hacking someones code or something like that.

I highly recommend for people to write disassemblers anyway, and this would be a good reason to, your education both in the art of learning the instruction set and how it is encoded, if a variable instruction length instruction set (x86) there is a lot more to learn (I recommend NOT learning one of those first, go with arm or thumb or something like that first, or at least something not as painful as x86, like the msp430). A good way to test your disassembler is to output code that can be re-assembled. assemble, disassemble, assemble and if the two assembly outputs match then your disassembler did a good job.

like image 39
old_timer Avatar answered Nov 15 '22 10:11

old_timer