Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate assembly code with gcc that can be compiled with nasm [duplicate]

Tags:

gcc

assembly

nasm

I am trying to learn assembly language as a hobby and I frequently use gcc -S to produce assembly output. This is pretty much straightforward, but I fail to compile the assembly output. I was just curious whether this can be done at all. I tried using both standard assembly output and intel syntax using the -masm=intel. Both can't be compiled with nasm and linked with ld.

Therefore I would like to ask whether it is possible to generate assembly code, that can be then compiled.

To be more precise I used the following C code.

 >> cat csimp.c 
 int main (void){
 int i,j;
   for(i=1;i<21;i++)
     j= i + 100;
  return 0;
  }

Generated assembly with gcc -S -O0 -masm=intel csimp.c and tried to compile with nasm -f elf64 csimp.s and link with ld -m elf_x86_64 -s -o test csimp.o. The output I got from nasm reads:

csimp.s:1: error: attempt to define a local label before any non-local labels
csimp.s:1: error: parser: instruction expected
csimp.s:2: error: attempt to define a local label before any non-local labels
csimp.s:2: error: parser: instruction expected

This is most probably due to broken assembly syntax. My hope is that I would be able to fix this without having to manually correct the output of gcc -S


Edit:

I was given a hint that my problem is solved in another question; unfortunately, after testing the method described there, I was not able to produce nasm assembly format. You can see the output of objconv below. Therefore I still need your help.

>>cat csimp.asm 
; Disassembly of file: csimp.o
; Sat Jan 30 20:17:39 2016
; Mode: 64 bits
; Syntax: YASM/NASM
; Instruction set: 8086, x64

global main:  ; **the ':' should be removed !!!** 


SECTION .text                                           ; section number 1, code

main:   ; Function begin
        push    rbp                                     ; 0000 _ 55
        mov     rbp, rsp                                ; 0001 _ 48: 89. E5
        mov     dword [rbp-4H], 1                       ; 0004 _ C7. 45, FC, 00000001
        jmp     ?_002                                   ; 000B _ EB, 0D

?_001:  mov     eax, dword [rbp-4H]                     ; 000D _ 8B. 45, FC
        add     eax, 100                                ; 0010 _ 83. C0, 64
        mov     dword [rbp-8H], eax                     ; 0013 _ 89. 45, F8
        add     dword [rbp-4H], 1                       ; 0016 _ 83. 45, FC, 01
?_002:  cmp     dword [rbp-4H], 20                      ; 001A _ 83. 7D, FC, 14
        jle     ?_001                                   ; 001E _ 7E, ED
        pop     rbp                                     ; 0020 _ 5D
        ret                                             ; 0021 _ C3
; main End of function


SECTION .data                                           ; section number 2, data


SECTION .bss                                            ; section number 3, bss

Apparent solution:

I made a mistake when cleaning up the output of objconv. I should have run:

sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g;  /default *rel/d" csimp.asm

All steps can be condensed in a bash script

#! /bin/bash

a=$( echo $1 | sed  "s/\.c//" ) # strip the file extension .c

# compile binary with minimal information
gcc -fno-asynchronous-unwind-tables -s -c ${a}.c 

# convert the executable to nasm format
./objconv/objconv -fnasm ${a}.o 

# remove unnecesairy objconv information
sed -i "s/align=1//g ; s/[a-z]*execute//g ; s/: *function//g;  /default *rel/d" ${a}.asm

# run nasm for 64-bit binary

nasm -f elf64 ${a}.asm 

# link --> see comment of MichaelPetch below
ld -m elf_x86_64 -s ${a}.o 

Running this code I get the ld warning:

 ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080 

The executable produced in this manner crashes with segmentation fault message. I would appreciate your help.

like image 930
Alexander Cska Avatar asked Jan 30 '16 13:01

Alexander Cska


People also ask

Can you compile assembly with GCC?

Yes, gcc can also compile assembly source code. Alternatively, you can invoke as , which is the assembler. (gcc is just a "driver" program that uses heuristics to call C compiler, C++ compiler, assembler, linker, etc..)

Does GCC output assembly?

Luckily, gcc does not output binary machine code directly. Instead, it internally writes assembler code, which then is translated by as into binary machine code (actually, gcc creates more intermediate structures). This internal assembler code can be outputted to a file, with some annotation to make it easier to read.


2 Answers

The difficulty I think you hit with the entry point error was attempting to use ld on an object file containing the entry point named main while ld was looking for an entry point named _start.

There are a couple of considerations. First, if you are linking with the C library for the use of functions like printf, linking will expect main as the entry point, but if you are not linking with the C library, ld will expect _start. Your script is very close, but you will need some way to differentiate which entry point you need to fully automate the process for any source file.

For example, the following is a conversion using your approach of a source file including printf. It was converted to nasm using objconv as follows:

Generate the object file:

gcc -fno-asynchronous-unwind-tables -s -c struct_offsetof.c -o s3.obj

Convert with objconv to nasm format assembly file

objconv -fnasm s3.obj

(note: my version of objconv added DOS line endings -- probably an option missed, I just ran it through dos2unix)

Using a slightly modified version of your sed call, tweak the contents:

sed -i -e 's/align=1//g' -e 's/[a-z]*execute//g' -e \
's/: *function//g' -e '/default *rel/d' s3.asm

(note: if no standard library functions, and using ld, change main to _start by adding the following expressions to your sed call)

-e 's/^main/_start/' -e 's/[ ]main[ ]*.*$/ _start/'

(there are probably more elegant expressions for this, this was just for example)

Compile with nasm (replacing original object file):

nasm -felf64 -o s3.obj s3.asm

Using gcc for link:

gcc -o s3 s3.obj

Test

$ ./s3

 sizeof test : 40

 myint  : 0  0
 mychar : 4  4
 myptr  : 8  8
 myarr  : 16  16
 myuint : 32  32
like image 162
David C. Rankin Avatar answered Sep 18 '22 08:09

David C. Rankin


You basically can't, at least directly. GCC does output assembly in Intel syntax; but NASM/MASM/TASM have their own Intel syntax. They are largely based on it, but there are as well some differences the assembler may not be able to understand and thus fail to compile.

The closest thing is probably having objdump show the assembly in Intel format:

objdump -d $file -M intel

Peter Cordes suggests in the comments that assembler directives will still target GAS, so they won't be recognized by NASM for example. They typically have the same name, but GAS-like directives start with a . as in .section text (vs section text).

like image 23
edmz Avatar answered Sep 17 '22 08:09

edmz