Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Basic OS X Assembly and the Mach-O format

I am interested in programming in x86-64 assembly on the Mac OS X platform. I came across this page about creating a 248B Mach-O program, which led me to Apple's own Mach-O format reference. After that I thought I'd make that same simple C program in Xcode and check out the generated assembly.

This was the code:

int main(int argc, const char * argv[])
{
    return 42;
}

But the assembly generated was 334 lines, containing (based on the 248B model) a lot of excess content.

Firstly, why is so much DWARF debug info included in the Release build of a C executable? Secondly, I notice the Mach-O header data is included 4 times (in different DWARF-related sections). Why is this necessary? Finally, the Xcode assembly includes:

.private_extern _main
.globl  _main
_main:
    .cfi_startproc

But in the 248B program, these are all nowhere to be seen - the program instead begins at _start. How is that possible if all programs by definition begin in main?


Full Xcode Assembly:

# Assembly output for main.c
# Generated at 4:04:08 PM on Sunday, January 20, 2013
# Using Release configuration, x86_64 architecture for Tiny target of Tiny project

    .section    __TEXT,__text,regular,pure_instructions
    .file   1 "/Users/####/Desktop/Tiny/Tiny/main.c"
    .section    __DWARF,__debug_info,regular,debug
Lsection_info:
    .section    __DWARF,__debug_abbrev,regular,debug
Lsection_abbrev:
    .section    __DWARF,__debug_aranges,regular,debug
    .section    __DWARF,__debug_macinfo,regular,debug
    .section    __DWARF,__debug_line,regular,debug
Lsection_line:
    .section    __DWARF,__debug_loc,regular,debug
    .section    __DWARF,__debug_pubtypes,regular,debug
    .section    __DWARF,__debug_str,regular,debug
Lsection_str:
    .section    __DWARF,__debug_ranges,regular,debug
Ldebug_range:
    .section    __DWARF,__debug_loc,regular,debug
Lsection_debug_loc:
    .section    __TEXT,__text,regular,pure_instructions
Ltext_begin:
    .section    __DATA,__data
    .section    __TEXT,__text,regular,pure_instructions
    .private_extern _main
    .globl  _main
_main:                                  ## @main
    .cfi_startproc
Lfunc_begin0:
    .loc    1 12 0                  ## /Users/####/Desktop/Tiny/Tiny/main.c:12:0
## BB#0:
    pushq   %rbp
Ltmp2:
    .cfi_def_cfa_offset 16
Ltmp3:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp4:
    .cfi_def_cfa_register %rbp
    ##DEBUG_VALUE: main:argc <- EDI+0
    ##DEBUG_VALUE: main:argv <- RSI+0
    movl    $42, %eax
    .loc    1 15 5 prologue_end     ## /Users/####/Desktop/Tiny/Tiny/main.c:15:5
Ltmp5:
    popq    %rbp
    ret
Ltmp6:
Lfunc_end0:
    .cfi_endproc

Ltext_end:
    .section    __DATA,__data
Ldata_end:
    .section    __TEXT,__text,regular,pure_instructions
Lsection_end1:
    .section    __DWARF,__debug_info,regular,debug
Linfo_begin1:
    .long   127                     ## Length of Compilation Unit Info
    .short  2                       ## DWARF version number
Lset0 = Labbrev_begin-Lsection_abbrev   ## Offset Into Abbrev. Section
    .long   Lset0
    .byte   8                       ## Address Size (in bytes)
    .byte   1                       ## Abbrev [1] 0xb:0x78 DW_TAG_compile_unit
Lset1 = Lstring0-Lsection_str           ## DW_AT_producer
    .long   Lset1
    .short  12                      ## DW_AT_language
Lset2 = Lstring1-Lsection_str           ## DW_AT_name
    .long   Lset2
    .quad   0                       ## DW_AT_entry_pc
    .long   0                       ## DW_AT_stmt_list
Lset3 = Lstring2-Lsection_str           ## DW_AT_comp_dir
    .long   Lset3
    .byte   1                       ## DW_AT_APPLE_optimized
    .byte   2                       ## Abbrev [2] 0x27:0x3e DW_TAG_subprogram
Lset4 = Lstring3-Lsection_str           ## DW_AT_name
    .long   Lset4
    .byte   1                       ## DW_AT_decl_file
    .byte   11                      ## DW_AT_decl_line
    .byte   1                       ## DW_AT_prototyped
    .long   101                     ## DW_AT_type
    .byte   1                       ## DW_AT_external
    .quad   Lfunc_begin0            ## DW_AT_low_pc
    .quad   Lfunc_end0              ## DW_AT_high_pc
    .byte   1                       ## DW_AT_frame_base
    .byte   86
    .byte   3                       ## Abbrev [3] 0x46:0xf DW_TAG_formal_parameter
Lset5 = Lstring5-Lsection_str           ## DW_AT_name
    .long   Lset5
    .byte   1                       ## DW_AT_decl_file
    .byte   11                      ## DW_AT_decl_line
    .long   101                     ## DW_AT_type
Lset6 = Ldebug_loc0-Lsection_debug_loc  ## DW_AT_location
    .long   Lset6
    .byte   3                       ## Abbrev [3] 0x55:0xf DW_TAG_formal_parameter
Lset7 = Lstring6-Lsection_str           ## DW_AT_name
    .long   Lset7
    .byte   1                       ## DW_AT_decl_file
    .byte   11                      ## DW_AT_decl_line
    .long   125                     ## DW_AT_type
Lset8 = Ldebug_loc2-Lsection_debug_loc  ## DW_AT_location
    .long   Lset8
    .byte   0                       ## End Of Children Mark
    .byte   4                       ## Abbrev [4] 0x65:0x7 DW_TAG_base_type
Lset9 = Lstring4-Lsection_str           ## DW_AT_name
    .long   Lset9
    .byte   5                       ## DW_AT_encoding
    .byte   4                       ## DW_AT_byte_size
    .byte   4                       ## Abbrev [4] 0x6c:0x7 DW_TAG_base_type
Lset10 = Lstring7-Lsection_str          ## DW_AT_name
    .long   Lset10
    .byte   6                       ## DW_AT_encoding
    .byte   1                       ## DW_AT_byte_size
    .byte   5                       ## Abbrev [5] 0x73:0x5 DW_TAG_const_type
    .long   108                     ## DW_AT_type
    .byte   6                       ## Abbrev [6] 0x78:0x5 DW_TAG_pointer_type
    .long   115                     ## DW_AT_type
    .byte   6                       ## Abbrev [6] 0x7d:0x5 DW_TAG_pointer_type
    .long   120                     ## DW_AT_type
    .byte   0                       ## End Of Children Mark
Linfo_end1:
    .section    __DWARF,__debug_abbrev,regular,debug
Labbrev_begin:
    .byte   1                       ## Abbreviation Code
    .byte   17                      ## DW_TAG_compile_unit
    .byte   1                       ## DW_CHILDREN_yes
    .byte   37                      ## DW_AT_producer
    .byte   14                      ## DW_FORM_strp
    .byte   19                      ## DW_AT_language
    .byte   5                       ## DW_FORM_data2
    .byte   3                       ## DW_AT_name
    .byte   14                      ## DW_FORM_strp
    .byte   82                      ## DW_AT_entry_pc
    .byte   1                       ## DW_FORM_addr
    .byte   16                      ## DW_AT_stmt_list
    .byte   6                       ## DW_FORM_data4
    .byte   27                      ## DW_AT_comp_dir
    .byte   14                      ## DW_FORM_strp
    .ascii   "\341\177"             ## DW_AT_APPLE_optimized
    .byte   12                      ## DW_FORM_flag
    .byte   0                       ## EOM(1)
    .byte   0                       ## EOM(2)
    .byte   2                       ## Abbreviation Code
    .byte   46                      ## DW_TAG_subprogram
    .byte   1                       ## DW_CHILDREN_yes
    .byte   3                       ## DW_AT_name
    .byte   14                      ## DW_FORM_strp
    .byte   58                      ## DW_AT_decl_file
    .byte   11                      ## DW_FORM_data1
    .byte   59                      ## DW_AT_decl_line
    .byte   11                      ## DW_FORM_data1
    .byte   39                      ## DW_AT_prototyped
    .byte   12                      ## DW_FORM_flag
    .byte   73                      ## DW_AT_type
    .byte   19                      ## DW_FORM_ref4
    .byte   63                      ## DW_AT_external
    .byte   12                      ## DW_FORM_flag
    .byte   17                      ## DW_AT_low_pc
    .byte   1                       ## DW_FORM_addr
    .byte   18                      ## DW_AT_high_pc
    .byte   1                       ## DW_FORM_addr
    .byte   64                      ## DW_AT_frame_base
    .byte   10                      ## DW_FORM_block1
    .byte   0                       ## EOM(1)
    .byte   0                       ## EOM(2)
    .byte   3                       ## Abbreviation Code
    .byte   5                       ## DW_TAG_formal_parameter
    .byte   0                       ## DW_CHILDREN_no
    .byte   3                       ## DW_AT_name
    .byte   14                      ## DW_FORM_strp
    .byte   58                      ## DW_AT_decl_file
    .byte   11                      ## DW_FORM_data1
    .byte   59                      ## DW_AT_decl_line
    .byte   11                      ## DW_FORM_data1
    .byte   73                      ## DW_AT_type
    .byte   19                      ## DW_FORM_ref4
    .byte   2                       ## DW_AT_location
    .byte   6                       ## DW_FORM_data4
    .byte   0                       ## EOM(1)
    .byte   0                       ## EOM(2)
    .byte   4                       ## Abbreviation Code
    .byte   36                      ## DW_TAG_base_type
    .byte   0                       ## DW_CHILDREN_no
    .byte   3                       ## DW_AT_name
    .byte   14                      ## DW_FORM_strp
    .byte   62                      ## DW_AT_encoding
    .byte   11                      ## DW_FORM_data1
    .byte   11                      ## DW_AT_byte_size
    .byte   11                      ## DW_FORM_data1
    .byte   0                       ## EOM(1)
    .byte   0                       ## EOM(2)
    .byte   5                       ## Abbreviation Code
    .byte   38                      ## DW_TAG_const_type
    .byte   0                       ## DW_CHILDREN_no
    .byte   73                      ## DW_AT_type
    .byte   19                      ## DW_FORM_ref4
    .byte   0                       ## EOM(1)
    .byte   0                       ## EOM(2)
    .byte   6                       ## Abbreviation Code
    .byte   15                      ## DW_TAG_pointer_type
    .byte   0                       ## DW_CHILDREN_no
    .byte   73                      ## DW_AT_type
    .byte   19                      ## DW_FORM_ref4
    .byte   0                       ## EOM(1)
    .byte   0                       ## EOM(2)
    .byte   0                       ## EOM(3)
Labbrev_end:
    .section    __DWARF,__apple_names,regular,debug
Lnames_begin:
    .long   1212240712              ## Header Magic
    .short  1                       ## Header Version
    .short  0                       ## Header Hash Function
    .long   1                       ## Header Bucket Count
    .long   1                       ## Header Hash Count
    .long   12                      ## Header Data Length
    .long   0                       ## HeaderData Die Offset Base
    .long   1                       ## HeaderData Atom Count
    .short  1                       ## eAtomTypeDIEOffset
    .short  6                       ## DW_FORM_data4
    .long   0                       ## Bucket 0
    .long   2090499946              ## Hash in Bucket 0
    .long   LNames0-Lnames_begin    ## Offset in Bucket 0
LNames0:
Lset11 = Lstring3-Lsection_str          ## main
    .long   Lset11
    .long   1                       ## Num DIEs
    .long   39
    .long   0
    .section    __DWARF,__apple_objc,regular,debug
Lobjc_begin:
    .long   1212240712              ## Header Magic
    .short  1                       ## Header Version
    .short  0                       ## Header Hash Function
    .long   1                       ## Header Bucket Count
    .long   0                       ## Header Hash Count
    .long   12                      ## Header Data Length
    .long   0                       ## HeaderData Die Offset Base
    .long   1                       ## HeaderData Atom Count
    .short  1                       ## eAtomTypeDIEOffset
    .short  6                       ## DW_FORM_data4
    .long   -1                      ## Bucket 0
    .section    __DWARF,__apple_namespac,regular,debug
Lnamespac_begin:
    .long   1212240712              ## Header Magic
    .short  1                       ## Header Version
    .short  0                       ## Header Hash Function
    .long   1                       ## Header Bucket Count
    .long   0                       ## Header Hash Count
    .long   12                      ## Header Data Length
    .long   0                       ## HeaderData Die Offset Base
    .long   1                       ## HeaderData Atom Count
    .short  1                       ## eAtomTypeDIEOffset
    .short  6                       ## DW_FORM_data4
    .long   -1                      ## Bucket 0
    .section    __DWARF,__apple_types,regular,debug
Ltypes_begin:
    .long   1212240712              ## Header Magic
    .short  1                       ## Header Version
    .short  0                       ## Header Hash Function
    .long   2                       ## Header Bucket Count
    .long   2                       ## Header Hash Count
    .long   20                      ## Header Data Length
    .long   0                       ## HeaderData Die Offset Base
    .long   3                       ## HeaderData Atom Count
    .short  1                       ## eAtomTypeDIEOffset
    .short  6                       ## DW_FORM_data4
    .short  3                       ## eAtomTypeTag
    .short  5                       ## DW_FORM_data2
    .short  5                       ## eAtomTypeTypeFlags
    .short  11                      ## DW_FORM_data1
    .long   0                       ## Bucket 0
    .long   1                       ## Bucket 1
    .long   193495088               ## Hash in Bucket 0
    .long   2090147939              ## Hash in Bucket 1
    .long   Ltypes0-Ltypes_begin    ## Offset in Bucket 0
    .long   Ltypes1-Ltypes_begin    ## Offset in Bucket 1
Ltypes0:
Lset12 = Lstring4-Lsection_str          ## int
    .long   Lset12
    .long   1                       ## Num DIEs
    .long   101
    .short  36
    .byte   0
    .long   0
Ltypes1:
Lset13 = Lstring7-Lsection_str          ## char
    .long   Lset13
    .long   1                       ## Num DIEs
    .long   108
    .short  36
    .byte   0
    .long   0
    .section    __DWARF,__debug_pubtypes,regular,debug
Lset14 = Lpubtypes_end1-Lpubtypes_begin1 ## Length of Public Types Info
    .long   Lset14
Lpubtypes_begin1:
    .short  2                       ## DWARF Version
Lset15 = Linfo_begin1-Lsection_info     ## Offset of Compilation Unit Info
    .long   Lset15
Lset16 = Linfo_end1-Linfo_begin1        ## Compilation Unit Length
    .long   Lset16
    .long   0                       ## End Mark
Lpubtypes_end1:
    .section    __DWARF,__debug_loc,regular,debug
Ldebug_loc0:
    .quad   Lfunc_begin0
    .quad   Ltmp6
Lset17 = Ltmp8-Ltmp7                    ## Loc expr size
    .short  Lset17
Ltmp7:
    .byte   85                      ## DW_OP_reg5
Ltmp8:
    .quad   0
    .quad   0
Ldebug_loc2:
    .quad   Lfunc_begin0
    .quad   Ltmp6
Lset18 = Ltmp10-Ltmp9                   ## Loc expr size
    .short  Lset18
Ltmp9:
    .byte   84                      ## DW_OP_reg4
Ltmp10:
    .quad   0
    .quad   0
Ldebug_loc4:
    .section    __DWARF,__debug_aranges,regular,debug
    .section    __DWARF,__debug_ranges,regular,debug
    .section    __DWARF,__debug_macinfo,regular,debug
    .section    __DWARF,__debug_inlined,regular,debug
Lset19 = Ldebug_inlined_end1-Ldebug_inlined_begin1 ## Length of Debug Inlined Information Entry
    .long   Lset19
Ldebug_inlined_begin1:
    .short  2                       ## Dwarf Version
    .byte   8                       ## Address Size (in bytes)
Ldebug_inlined_end1:
    .section    __DWARF,__debug_str,regular,debug
Lstring0:
    .asciz   "Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)"
Lstring1:
    .asciz   "/Users/####/Desktop/Tiny/Tiny/main.c"
Lstring2:
    .asciz   "/Users/####/Desktop/Tiny"
Lstring3:
    .asciz   "main"
Lstring4:
    .asciz   "int"
Lstring5:
    .asciz   "argc"
Lstring6:
    .asciz   "argv"
Lstring7:
    .asciz   "char"

.subsections_via_symbols
like image 672
Ephemera Avatar asked Dec 07 '22 09:12

Ephemera


1 Answers

Firstly, why is so much DWARF debug info included in the Release build of a C executable?

Being able to debug optimized code is incredibly useful. Cases in which bugs are only visible in optimized builds are not rare. If you're hand writing assembly you're unlikely to care about DWARF information though, so I'd suggest building your comparison code without the -g argument.


Secondly, I notice the Mach-O header data is included 4 times (in different DWARF-related sections). Why is this necessary?

These aren't Mach-O headers that you're seeing. They're the headers for DWARF accelerator tables, an LLVM extension to DWARF that optimizes the test for whether a symbol is defined within a given compilation unit.


But in the 248B program, these are all nowhere to be seen - the program instead begins at _start. How is that possible if all programs by definition begin in main?

Historically on OS X all programs begin at start. However, this symbol typically comes from a system library rather than being defined by the program itself. The system implementation of start will perform some initialization and then jump to your programs "real" entry point.

The entry points to Mach-O binaries is defined by either the LC_UNIXTHREAD or LC_MAIN load commands. When LC_UNIXTHREAD, the convention for pre-10.8 versions of OS X, is used with a regular C or C++ program the linker uses start as the entry point. This symbol typically comes from /usr/lib/crt1.o, and its address is written in to the instruction pointer field of the LC_UNIXTHREAD load command. The 248B binary you link to includes an LC_UNIXTHREAD command with eip set to 0x000010e8. That's the address of the symbol _start. Since this small program is a static executable and the binary is generated directly it can write whatever address it wishes to in to the instruction pointer field of the load command.

If you're building your executable targeting OS X 10.8+ the linker will generate an LC_MAIN load command instead of LC_UNIXTHREAD. The kernel knows that binaries using the LC_MAIN command should be executed by loading the dynamic linker and jumping to its entry point. The dynamic linker, dyld, initializes itself and then jumps to the address specified in the LC_MAIN command. In this brave new world no symbol named start is used at all.

like image 152
bdash Avatar answered Dec 31 '22 09:12

bdash