I've dabbled in and out of trying to get a grasp on how to do some simple programming in assembly. I am going over a tutorial hello world program and most of the stuff they have explained makes sense, but they are really glossing over it. I would like some help in understanding some different parts of the program. Here is their tutorial example -
section .text
global main ;must be declared for linker (ld)
main: ;tells linker entry point
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;our dear string
len equ $ - msg ;length of our dear string
There is the text section and the data section. The data section seems to hold our user defined info for the program. It looks like the "frame" of the program is in the text section and the "meat" is in the data section... ? I assume the program when compiled executes the text section with data from the data section filled into the text section? The bss/text/data section interaction is kind of foreign to me. Also in the data section where the msg and len.... variables? are mentioned, they are followed by some information i'm not sure what to make of. msg is followed by db, what does this mean? Then the text, and then 0xa, what is the 0xa for? Also len is followed by equ, does this mean equals? len equals dollarsign minus msg variable? What is the dollar sign? A sort of operator? Also the instructions in the text section, mov ebx,1 apparently, or seems to tell the program to utilize STDOUT? Is moving 1 to the ebx register a standard instruction for setting stdout?
Perhaps someone has a little more thorough tutorial to recommend? I am looking to get dirty with assembly and need to teach myself some of the... "core fundamentals" if you will. Thanks for all the help!
An assembler instruction is a request to the assembler to do certain operations during the assembly of a source module; for example, defining data constants, reserving storage areas, and defining the end of the source module.
Each source statement may include up to four fields: a label, an operation (instruction mnemonic or assembler directive), an operand, and a comment. The following are examples of an assembly directive and a regular machine instruction.
msg is a global variable pointing to the string that follows - the db means data byte , indicating that the assembler should just emit the literal bytes that follow.
[NB - I don't know what assembler dialect you're using, so I just took some "best guesses" at some parts of this stuff. If someone can help clarify, that would be great.]
It looks like the "frame" of the program is in the text section and the "meat" is in the data section... ?
The text section contains the executable instructions that make up your program. The data section contains the data that said program is going to operate on. The reason there are two different sections is to allow the program loader and operating system to be able to provide you with some protections. The text section can be loaded in to read-only memory, for example, and the data section can be loaded into memory marked as "non-executable", so code isn't accidentally (or maliciously) executed from that region.
I assume the program when compiled executes the text section with data from the data section filled into the text section?
The program (instructions in the text section) normally references symbols and manipulates data in the data section, if that's what you're asking.
The bss/text/data section interaction is kind of foreign to me.
The BSS section is similar to the data section, except it's all zero-initialized. That means it doesn't need to actually take up space in the executable file. The program loader just has to make an appropriately sized block of zero bytes in memory. Your program doesn't have a BSS section.
Also in the data section where the msg and len.... variables? are mentioned, they are followed by some information i'm not sure what to make of. msg is followed by db, what does this mean?
msg
and len
are variables of a sort, yes. msg
is a global variable pointing to the string that follows - the db
means data byte
, indicating that the assembler should just emit the literal bytes that follow. len
is being set to the length of the string (more below).
Then the text, and then 0xa, what is the 0xa for?
0x0a
is the hexadecimal value of an ASCII newline character.
Also len is followed by equ, does this mean equals?
Yes.
len equals dollarsign minus msg variable? What is the dollar sign? A sort of operator?
The $
means "the current location". As the assembler is going about its job, it keeps track of how many bytes of data and code it's generated in a counter. So this code is saying: "subtract the location of the msg
label from the current location and store that number as len
". Since the "current location" is just past the end of the string, you get the length there.
Also the instructions in the text section, mov ebx,1 apparently, or seems to tell the program to utilize STDOUT? Is moving 1 to the ebx register a standard instruction for setting stdout?
The program is making a system call via the int 0x80
instruction. Before that, it has to set things up in a way the OS expects - in this case that looks like putting a 1
in ebx1
to mean stdout
, along with the other three registers - the message length in edx
, a pointer to the message in ecx
, and the system call number in eax
. I'd guess you're on linux - you can look up a system call table from google without too much trouble, I'm sure.
Perhaps someone has a little more thorough tutorial to recommend?
Sorry, not off the top of my head.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With