I am trying to learn assembly language as a hobby and I frequently use gcc -S
to produce assembly output. This is pretty much straightforward, but I fail to compile the assembly output. I was just curious whether this can be done at all. I tried using both standard assembly output and intel syntax using the -masm=intel
. Both can't be compiled with nasm
and linked with ld
.
Therefore I would like to ask whether it is possible to generate assembly code, that can be then compiled.
To be more precise I used the following C code.
>> cat csimp.c
int main (void){
int i,j;
for(i=1;i<21;i++)
j= i + 100;
return 0;
}
Generated assembly with gcc -S -O0 -masm=intel csimp.c
and tried to compile with nasm -f elf64 csimp.s
and link with ld -m elf_x86_64 -s -o test csimp.o
. The output I got from nasm reads:
csimp.s:1: error: attempt to define a local label before any non-local labels
csimp.s:1: error: parser: instruction expected
csimp.s:2: error: attempt to define a local label before any non-local labels
csimp.s:2: error: parser: instruction expected
This is most probably due to broken assembly syntax. My hope is that I would be able to fix this without having to manually correct the output of gcc -S
Edit:
I was given a hint that my problem is solved in another question; unfortunately, after testing the method described there, I was not able to produce nasm
assembly format. You can see the output of objconv
below.
Therefore I still need your help.
>>cat csimp.asm
; Disassembly of file: csimp.o
; Sat Jan 30 20:17:39 2016
; Mode: 64 bits
; Syntax: YASM/NASM
; Instruction set: 8086, x64
global main: ; **the ':' should be removed !!!**
SECTION .text ; section number 1, code
main: ; Function begin
push rbp ; 0000 _ 55
mov rbp, rsp ; 0001 _ 48: 89. E5
mov dword [rbp-4H], 1 ; 0004 _ C7. 45, FC, 00000001
jmp ?_002 ; 000B _ EB, 0D
?_001: mov eax, dword [rbp-4H] ; 000D _ 8B. 45, FC
add eax, 100 ; 0010 _ 83. C0, 64
mov dword [rbp-8H], eax ; 0013 _ 89. 45, F8
add dword [rbp-4H], 1 ; 0016 _ 83. 45, FC, 01
?_002: cmp dword [rbp-4H], 20 ; 001A _ 83. 7D, FC, 14
jle ?_001 ; 001E _ 7E, ED
pop rbp ; 0020 _ 5D
ret ; 0021 _ C3
; main End of function
SECTION .data ; section number 2, data
SECTION .bss ; section number 3, bss
Apparent solution:
I made a mistake when cleaning up the output of objconv
. I should have run:
sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g; /default *rel/d" csimp.asm
All steps can be condensed in a bash
script
#! /bin/bash
a=$( echo $1 | sed "s/\.c//" ) # strip the file extension .c
# compile binary with minimal information
gcc -fno-asynchronous-unwind-tables -s -c ${a}.c
# convert the executable to nasm format
./objconv/objconv -fnasm ${a}.o
# remove unnecesairy objconv information
sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g; /default *rel/d" ${a}.asm
# run nasm for 64-bit binary
nasm -f elf64 ${a}.asm
# link --> see comment of MichaelPetch below
ld -m elf_x86_64 -s ${a}.o
Running this code I get the ld
warning:
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080
The executable produced in this manner crashes with segmentation fault message. I would appreciate your help.
Yes, gcc can also compile assembly source code. Alternatively, you can invoke as , which is the assembler. (gcc is just a "driver" program that uses heuristics to call C compiler, C++ compiler, assembler, linker, etc..)
Luckily, gcc does not output binary machine code directly. Instead, it internally writes assembler code, which then is translated by as into binary machine code (actually, gcc creates more intermediate structures). This internal assembler code can be outputted to a file, with some annotation to make it easier to read.
The difficulty I think you hit with the entry point error was attempting to use ld
on an object file containing the entry point named main
while ld
was looking for an entry point named _start
.
There are a couple of considerations. First, if you are linking with the C library for the use of functions like printf
, linking will expect main
as the entry point, but if you are not linking with the C library, ld
will expect _start
. Your script is very close, but you will need some way to differentiate which entry point you need to fully automate the process for any source file.
For example, the following is a conversion using your approach of a source file including printf
. It was converted to nasm
using objconv
as follows:
Generate the object file:
gcc -fno-asynchronous-unwind-tables -s -c struct_offsetof.c -o s3.obj
Convert with objconv to nasm format assembly file
objconv -fnasm s3.obj
(note: my version of objconv
added DOS line endings -- probably an option missed, I just ran it through dos2unix
)
Using a slightly modified version of your sed
call, tweak the contents:
sed -i -e 's/align=1//g' -e 's/[a-z]*execute//g' -e \
's/: *function//g' -e '/default *rel/d' s3.asm
(note: if no standard library functions, and using ld
, change main
to _start
by adding the following expressions to your sed
call)
-e 's/^main/_start/' -e 's/[ ]main[ ]*.*$/ _start/'
(there are probably more elegant expressions for this, this was just for example)
Compile with nasm
(replacing original object file):
nasm -felf64 -o s3.obj s3.asm
Using gcc
for link:
gcc -o s3 s3.obj
Test
$ ./s3
sizeof test : 40
myint : 0 0
mychar : 4 4
myptr : 8 8
myarr : 16 16
myuint : 32 32
You basically can't, at least directly. GCC does output assembly in Intel syntax; but NASM/MASM/TASM have their own Intel syntax. They are largely based on it, but there are as well some differences the assembler may not be able to understand and thus fail to compile.
The closest thing is probably having objdump
show the assembly in Intel format:
objdump -d $file -M intel
Peter Cordes suggests in the comments that assembler directives will still target GAS, so they won't be recognized by NASM for example. They typically have the same name, but GAS-like directives start with a .
as in .section text
(vs section text
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With