I'm learning assembly, motivation being able to reverse engineer. I'm trying to find the assembler I should begin with, so that I can then find tutorials and start writing some assembly.
I came to know that MASM has a lot of built in constructs, so I'll be using them mostly instead of coding them which I'll have to do if I choose NASM.
My question is.. is that true? If it is, what assembler you suggest for learning assembly from a reverse engineer's perspective and some good tutorials for it.
Also, if you have other suggestions regarding reversing? Alternative approach or something?
P.S: I have read many articles and questions here, but am still confused.
Masm, the Microsoft assembler, is the most commonly taught x86 assembler. Unfortunately, its use is limited to Windows. nasm is a free cross-platform x86 assembler which supports all the common x86 operating systems – Linux, MacOS X and Windows. Unlike the GNU assembler, it uses the same Intel syntax that masm does.
asm file referrers to an assembly source file. It can be for any assembler or platform. A *. nasm file is also an assembly source file but it's for the NASM assembler.
The Microsoft Macro Assembler (MASM) is an x86 assembler that uses the Intel syntax for MS-DOS and Microsoft Windows. Beginning with MASM 8.0, there are two versions of the assembler: One for 16-bit & 32-bit assembly sources, and another (ML64) for 64-bit sources only. Microsoft Macro Assembler. Developer(s)
The Netwide Assembler (NASM) is an assembler and disassembler for the Intel x86 architecture. It can be used to write 16-bit, 32-bit (IA-32) and 64-bit (x86-64) programs. It is considered one of the most popular assemblers for Linux.
My recommendation purely from a "reverse engineering" perspective is to understand how a compiler translates high-level concepts into assembly language instructions in the first place. The understanding of how register allocation is done in various compilers and how various optimizations will obscure the high-level representation of nested loops (et.al.) is more important than being able to write one particular dialect of assembly input.
Your best bet is to start with the assembly language intermediate files from source code that you write (see this question for more information). Then you can change the source and see how it affects the intermediate files. The other place to start is by using an interactive disassembler like IDA Pro.
Actually writing assembly language programs and learning the syntax of NASM, MASM, gas
, of as
is the easiest part and it does not really matter which one you learn. They are very similar because the syntax of the source language is very basic. If you are planning to learn how to disassemble and understand what a program is doing, then I would completely ignore macro assemblers since the macros completely disappear during translation and you will not see them when looking at disassembler output.
Diatribe on Learning Assembly
Learning an assembly language is different than learning a higher level programming language. There are fewer syntactical constructs if you ignore macro assemblers. The problem is that every compiler chain has a slightly different representation so you have to concentrate on the concepts such as supported address modes, register restrictions, etc. These aren't part of the language per se as they are dictated by the hardware.
The approach that I took (partially because the university forced me to), is to explore and understand the hardware itself (e.g., # of registers, size of registers, type of branch instructions supported, etc.) and slightly more academic concepts such as interrupts and using bitwise manipulation for integer match before you start to write assembly language programs. This is a much longer route but results in a rich understanding of assembly and how to write high performance programs.
The interesting thing is that in the time that I spent learning assembly and compiler construction (which is intrinsically related), I actually wrote very few assembly programs. More often, I am required to write little snippets of inline assembly here and there (e.g. setting up index registers when the runtime loader didn't). I have spent an enormous amount of time dissecting crash dumps from a memory location, loader map file, and assembler output listings. I can honestly say that the syntax of each assembler is dramatically different as well as what various compilers will do to muddle the intent into fast or small code.
Learning how to write assembly programs was the least worthwhile part of the education process. It was necessary to understand how source is translated into the bits and bytes that the computer executes, but it really was not what I really needed to reverse engineer from a raw binary (disassembler -> assembly listing -> best guess of high level intent) or a memory dump. I do more of the latter, but the requirements of the job are the same.
move
if (a > 0)
to mov.b r0,d0 ... bnz $L
Start by learning about computer architecture (e.g., read something from Andrew Tanenbaum), then how an OS actually loads and runs a program (Levine's Linkers & Loaders), then compile simple programs in C/C++ and look at the assembly language listings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With