I have spent the past few days experimenting with assembly, and now understand the relationship between assembly and machine code (using x86 via NASM on OSX, reading the Intel docs).
Now I am trying to understand the details of how the linker works, and specifically want to understand the structure of Mach-O object files, starting with the Mach-O headers.
My question is, can you map out how the Mach-O headers below map to the otool
command output (which displays the headers, but they are in a different format)?
Some reasons for this question include:
Below I show the example and process I went through to try to decode the Mach-O header from a real object file. Throughout the descriptions below, I try to show hints of all the little/subtle questions that arise. Hopefully this will provide a sense of how this can be very confusing to a newcomer.
Starting with a basic C file called example.c
:
#include <stdio.h>
int
main() {
printf("hello world");
return 0;
}
Compile it with gcc example.c -o example.out
, which gives:
cffa edfe 0700 0001 0300 0080 0200 0000
1000 0000 1005 0000 8500 2000 0000 0000
1900 0000 4800 0000 5f5f 5041 4745 5a45
524f 0000 0000 0000 0000 0000 0000 0000
0000 0000 0100 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 1900 0000 2802 0000
5f5f 5445 5854 0000 0000 0000 0000 0000
0000 0000 0100 0000 0010 0000 0000 0000
0000 0000 0000 0000 0010 0000 0000 0000
0700 0000 0500 0000 0600 0000 0000 0000
5f5f 7465 7874 0000 0000 0000 0000 0000
5f5f 5445 5854 0000 0000 0000 0000 0000
400f 0000 0100 0000 2d00 0000 0000 0000
400f 0000 0400 0000 0000 0000 0000 0000
0004 0080 0000 0000 0000 0000 0000 0000
5f5f 7374 7562 7300 0000 0000 0000 0000
5f5f 5445 5854 0000 0000 0000 0000 0000
6e0f 0000 0100 0000 0600 0000 0000 0000
6e0f 0000 0100 0000 0000 0000 0000 0000
0804 0080 0000 0000 0600 0000 0000 0000
5f5f 7374 7562 5f68 656c 7065 7200 0000
... 531 total lines of this
Run otool -h example.out
, which prints:
example.out:
Mach header
magic cputype cpusubtype caps filetype ncmds sizeofcmds flags
0xfeedfacf 16777223 3 0x80 2 16 1296 0x00200085
To understand the Mach-O file format, I found these resources helpful:
Those last 3 from opensource.apple.com contain all the constants, such as these:
#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */
...
#define CPU_TYPE_MC680x0 ((cpu_type_t) 6)
#define CPU_TYPE_X86 ((cpu_type_t) 7)
#define CPU_TYPE_I386 CPU_TYPE_X86 /* compatibility */
#define CPU_TYPE_X86_64 (CPU_TYPE_X86 | CPU_ARCH_ABI64)
The structure of the Mach-O header is shown as:
struct mach_header_64 {
uint32_t magic; /* mach magic number identifier */
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
uint32_t filetype; /* type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
uint32_t reserved; /* reserved */
};
Given this information, the goal was to find each of those pieces of the Mach-O header in the example.out
object file.
Given that example and research, I was able to identify the first part of the Mach-O header, the "magic number". That was cool.
But it wasn't a straightforward process. Here are the pieces of information that had to be collected to figure that out.
otool
output shows "magic" to be 0xfeedfacf
.MH_MAGIC
or MH_CIGAM
("magic" in reverse). So found those through google in mach-o/loader.h. Since I am using 64-bit architecture and not 32-bit, went with MH_MAGIC_64
(0xfeedfacf
) and MH_CIGAM_64
(0xcffaedfe
).example.out
file and the first 8 hex codes were cffa edfe
, which matches MH_CIGAM_64
! It's in a different format which throws you off a little bit, but they are 2 different hex formats that are close enough to see the connection. They are also reversed.Here are the 3 numbers, which were enough to sort of figure out what the magic number is:
0xcffaedfe // value from MH_CIGAM_64
0xfeedfacf // value from otool
cffa edfe // value in example.out
So that's exciting! Still not totally sure if I am coming to the right conclusion about these numbers, but hope so.
Now it starts to get confusing. Here are the pieces that needed to be put together to almost make sense of it, but this is where I'm stuck so far:
otool
shows 16777223
. This apple stackexchange question gave some hints on how to understand this.CPU_TYPE_X86_64
in mach/machine.h, and had to do several calculations to figure out it's value.Here are the relevant constants to do calculate the value of CPU_TYPE_X86_64
:
#define CPU_ARCH_ABI64 0x01000000 /* 64 bit ABI */
#define CPU_TYPE_X86 ((cpu_type_t) 7)
#define CPU_TYPE_I386 CPU_TYPE_X86 /* compatibility */
#define CPU_TYPE_X86_64 (CPU_TYPE_X86 | CPU_ARCH_ABI64)
So basically:
CPU_TYPE_X86_64 = 7 BITWISEOR 0x01000000 // 16777223
That number 16777223
matches what is shown by otool
, nice!
Next, tried to find that number in the example.out
, but it doesn't exist because that is a decimal number. I just converted this to hex in JavaScript, where
> (16777223).toString(16)
'1000007'
So not sure if this is the correct way to generate a hex number, especially one that will match the hex numbers in a Mach-O object file. 1000007
is only 7 numbers too, so don't know if you are supposed to "pad" it or something.
Anyways, you see this number example.out
, right after the magic number:
0700 0001
Hmm, they seem somewhat related:
0700 0001
1000007
It looks like there was a 0
added to the end of 1000007
, and that it was reversed.
At this point I wanted to ask the question, already spent a few hours to get to this point. How does the structure of the Mach-O header map to the actual Mach-O object file? Can you show how each part of the header shows up in the example.out
file above, with a brief explanation why?
Part of what's confusing you is endianness. In this case, the header is stored in the native format for the platform. Intel-compatible platforms are little-endian systems, meaning the least-significant byte of a multi-byte value is first in the byte sequence.
So, the byte sequence 07 00 00 01
, when interpreted as a little-endian 32-bit value, corresponds to 0x01000007
.
The other thing you need to know to interpret the structure is the size of each field. All of the uint32_t
fields are pretty straightforward. They are 32-bit unsigned integers.
Both cpu_type_t
and cpu_subtype_t
are defined in machine.h that you linked to be equivalent to integer_t
. integer_t
is defined to be equivalent to int
in /usr/include/mach/i386/vm_types.h. OS X is an LP64 platform, which means that long
s and pointers are sensitive to the architecture (32- vs. 64-bit), but int
is not. It's always 32-bit.
So, all of the fields are 32 bits or 4 bytes in size. Since there are 8 fields, that's a total of 32 bytes.
From your original hexdump, here's the part which corresponds to the header:
cffa edfe 0700 0001 0300 0080 0200 0000
1000 0000 1005 0000 8500 2000 0000 0000
Broken out by field:
struct mach_header_64 {
uint32_t magic; cf fa ed fe -> 0xfeedfacf
cpu_type_t cputype; 07 00 00 01 -> 0x01000007
cpu_subtype_t cpusubtype; 03 00 00 80 -> 0x80000003
uint32_t filetype; 02 00 00 00 -> 0x00000002
uint32_t ncmds; 10 00 00 00 -> 0x00000010
uint32_t sizeofcmds; 10 05 00 00 -> 0x00000510
uint32_t flags; 85 00 20 00 -> 0x00200085
uint32_t reserved; 00 00 00 00 -> 0x00000000
};
MAGIC
or CIGAM
gives you hints on byte ordering used in the file. When you read the first four bytes as cffaedfe
this means that you should interpret any 4 bytes in little endian. Means that you write numbers with units first, then tenth, etc. So, when you read 07000001
it represents the number 01000007 which is exactly what you were waiting for (1000007) except the leading 0. May I suggest you to read about byte ordering?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With