Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read Mach-O header from object file?

I have spent the past few days experimenting with assembly, and now understand the relationship between assembly and machine code (using x86 via NASM on OSX, reading the Intel docs).

Now I am trying to understand the details of how the linker works, and specifically want to understand the structure of Mach-O object files, starting with the Mach-O headers.

My question is, can you map out how the Mach-O headers below map to the otool command output (which displays the headers, but they are in a different format)?

Some reasons for this question include:

  • It will help me see how the documents on the "structure of Mach-O headers" look in real-world object files.
  • It will simplify the path to understanding, so myself and other newcomers don't have to spend many hours or days wondering "do they mean this, or this" type thing. It's hard without previous experience to mentally translate the general Mach-O documentation into an actual object file in the real world.

Below I show the example and process I went through to try to decode the Mach-O header from a real object file. Throughout the descriptions below, I try to show hints of all the little/subtle questions that arise. Hopefully this will provide a sense of how this can be very confusing to a newcomer.


Example

Starting with a basic C file called example.c:

#include <stdio.h>

int
main() {
  printf("hello world");
  return 0;
}

Compile it with gcc example.c -o example.out, which gives:

cffa edfe 0700 0001 0300 0080 0200 0000
1000 0000 1005 0000 8500 2000 0000 0000
1900 0000 4800 0000 5f5f 5041 4745 5a45
524f 0000 0000 0000 0000 0000 0000 0000
0000 0000 0100 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 1900 0000 2802 0000
5f5f 5445 5854 0000 0000 0000 0000 0000
0000 0000 0100 0000 0010 0000 0000 0000
0000 0000 0000 0000 0010 0000 0000 0000
0700 0000 0500 0000 0600 0000 0000 0000
5f5f 7465 7874 0000 0000 0000 0000 0000
5f5f 5445 5854 0000 0000 0000 0000 0000
400f 0000 0100 0000 2d00 0000 0000 0000
400f 0000 0400 0000 0000 0000 0000 0000
0004 0080 0000 0000 0000 0000 0000 0000
5f5f 7374 7562 7300 0000 0000 0000 0000
5f5f 5445 5854 0000 0000 0000 0000 0000
6e0f 0000 0100 0000 0600 0000 0000 0000
6e0f 0000 0100 0000 0000 0000 0000 0000
0804 0080 0000 0000 0600 0000 0000 0000
5f5f 7374 7562 5f68 656c 7065 7200 0000
... 531 total lines of this

Run otool -h example.out, which prints:

example.out:
Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
 0xfeedfacf 16777223          3  0x80          2    16       1296 0x00200085

Research

To understand the Mach-O file format, I found these resources helpful:

  • https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/MachORuntime/index.html#//apple_ref/doc/uid/TP40000895
  • https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/MachORuntime/index.html
  • https://www.mikeash.com/pyblog/friday-qa-2012-11-30-lets-build-a-mach-o-executable.html
  • http://www.opensource.apple.com/source/xnu/xnu-1456.1.26/EXTERNAL_HEADERS/mach-o/loader.h
  • http://www.opensource.apple.com/source/dtrace/dtrace-78/head/arch.h
  • http://www.opensource.apple.com/source/xnu/xnu-792.13.8/osfmk/mach/machine.h

Those last 3 from opensource.apple.com contain all the constants, such as these:

#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */
...
#define CPU_TYPE_MC680x0  ((cpu_type_t) 6)
#define CPU_TYPE_X86    ((cpu_type_t) 7)
#define CPU_TYPE_I386   CPU_TYPE_X86    /* compatibility */
#define CPU_TYPE_X86_64   (CPU_TYPE_X86 | CPU_ARCH_ABI64)

The structure of the Mach-O header is shown as:

struct mach_header_64 {
  uint32_t  magic;    /* mach magic number identifier */
  cpu_type_t  cputype;  /* cpu specifier */
  cpu_subtype_t cpusubtype; /* machine specifier */
  uint32_t  filetype; /* type of file */
  uint32_t  ncmds;    /* number of load commands */
  uint32_t  sizeofcmds; /* the size of all the load commands */
  uint32_t  flags;    /* flags */
  uint32_t  reserved; /* reserved */
};

Given this information, the goal was to find each of those pieces of the Mach-O header in the example.out object file.


First: Finding the "magic" number

Given that example and research, I was able to identify the first part of the Mach-O header, the "magic number". That was cool.

But it wasn't a straightforward process. Here are the pieces of information that had to be collected to figure that out.

  • The first column of the otool output shows "magic" to be 0xfeedfacf.
  • The Apple Mach-O docs say that the header should be either MH_MAGIC or MH_CIGAM ("magic" in reverse). So found those through google in mach-o/loader.h. Since I am using 64-bit architecture and not 32-bit, went with MH_MAGIC_64 (0xfeedfacf) and MH_CIGAM_64 (0xcffaedfe).
  • Looked through example.out file and the first 8 hex codes were cffa edfe, which matches MH_CIGAM_64! It's in a different format which throws you off a little bit, but they are 2 different hex formats that are close enough to see the connection. They are also reversed.

Here are the 3 numbers, which were enough to sort of figure out what the magic number is:

0xcffaedfe // value from MH_CIGAM_64
0xfeedfacf // value from otool
cffa edfe  // value in example.out

So that's exciting! Still not totally sure if I am coming to the right conclusion about these numbers, but hope so.


Next: Finding the cputype

Now it starts to get confusing. Here are the pieces that needed to be put together to almost make sense of it, but this is where I'm stuck so far:

  • otool shows 16777223. This apple stackexchange question gave some hints on how to understand this.
  • Found CPU_TYPE_X86_64 in mach/machine.h, and had to do several calculations to figure out it's value.

Here are the relevant constants to do calculate the value of CPU_TYPE_X86_64:

#define CPU_ARCH_ABI64  0x01000000      /* 64 bit ABI */
#define CPU_TYPE_X86        ((cpu_type_t) 7)
#define CPU_TYPE_I386       CPU_TYPE_X86        /* compatibility */
#define CPU_TYPE_X86_64     (CPU_TYPE_X86 | CPU_ARCH_ABI64)

So basically:

CPU_TYPE_X86_64 = 7 BITWISEOR 0x01000000 // 16777223

That number 16777223 matches what is shown by otool, nice!

Next, tried to find that number in the example.out, but it doesn't exist because that is a decimal number. I just converted this to hex in JavaScript, where

> (16777223).toString(16)
'1000007'

So not sure if this is the correct way to generate a hex number, especially one that will match the hex numbers in a Mach-O object file. 1000007 is only 7 numbers too, so don't know if you are supposed to "pad" it or something.

Anyways, you see this number example.out, right after the magic number:

0700 0001

Hmm, they seem somewhat related:

0700 0001
1000007

It looks like there was a 0 added to the end of 1000007, and that it was reversed.


Question

At this point I wanted to ask the question, already spent a few hours to get to this point. How does the structure of the Mach-O header map to the actual Mach-O object file? Can you show how each part of the header shows up in the example.out file above, with a brief explanation why?

like image 607
Lance Avatar asked Dec 27 '14 17:12

Lance


2 Answers

Part of what's confusing you is endianness. In this case, the header is stored in the native format for the platform. Intel-compatible platforms are little-endian systems, meaning the least-significant byte of a multi-byte value is first in the byte sequence.

So, the byte sequence 07 00 00 01, when interpreted as a little-endian 32-bit value, corresponds to 0x01000007.

The other thing you need to know to interpret the structure is the size of each field. All of the uint32_t fields are pretty straightforward. They are 32-bit unsigned integers.

Both cpu_type_t and cpu_subtype_t are defined in machine.h that you linked to be equivalent to integer_t. integer_t is defined to be equivalent to int in /usr/include/mach/i386/vm_types.h. OS X is an LP64 platform, which means that longs and pointers are sensitive to the architecture (32- vs. 64-bit), but int is not. It's always 32-bit.

So, all of the fields are 32 bits or 4 bytes in size. Since there are 8 fields, that's a total of 32 bytes.

From your original hexdump, here's the part which corresponds to the header:

cffa edfe 0700 0001 0300 0080 0200 0000
1000 0000 1005 0000 8500 2000 0000 0000

Broken out by field:

struct mach_header_64 {
  uint32_t  magic;           cf fa ed fe -> 0xfeedfacf
  cpu_type_t  cputype;       07 00 00 01 -> 0x01000007
  cpu_subtype_t cpusubtype;  03 00 00 80 -> 0x80000003
  uint32_t  filetype;        02 00 00 00 -> 0x00000002
  uint32_t  ncmds;           10 00 00 00 -> 0x00000010
  uint32_t  sizeofcmds;      10 05 00 00 -> 0x00000510
  uint32_t  flags;           85 00 20 00 -> 0x00200085
  uint32_t  reserved;        00 00 00 00 -> 0x00000000
};
like image 81
Ken Thomases Avatar answered Nov 09 '22 06:11

Ken Thomases


MAGIC or CIGAM gives you hints on byte ordering used in the file. When you read the first four bytes as cffaedfe this means that you should interpret any 4 bytes in little endian. Means that you write numbers with units first, then tenth, etc. So, when you read 07000001 it represents the number 01000007 which is exactly what you were waiting for (1000007) except the leading 0. May I suggest you to read about byte ordering?

like image 21
Jean-Baptiste Yunès Avatar answered Nov 09 '22 06:11

Jean-Baptiste Yunès