Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the ELF header of the main executable

Tags:

c++

c

linux

elf

For various purposes, I am trying to obtain the address of the ELF header of the main executable without parsing /proc/self/maps. I have tried parsing the link_list chain given by dlopen/dlinfo functions but they do not contain an entry where l_addr points to the base address of the main executable. Is there any way to do this (Standard or not) without parsing /proc/self/maps?

An example of what I'm trying to do:

#include <stdio.h>
#include <elf.h>
int main()
{
    Elf32_Ehdr* header = /* Somehow obtain the address of the ELF header of this program */;
    printf("%p\n", header);
    /* Read the header and do stuff, etc */
    return 0;
}
like image 931
小太郎 Avatar asked Jan 16 '12 04:01

小太郎


People also ask

How do you get ELF headers?

To find them the ELF header is used, which is located at the very start of the file. The first bytes contain the elf magic "\x7fELF" , followed by the class ID (32 or 64 bit ELF file), the data format ID (little endian/big endian), the machine type, etc. Finally, the entry point of this file is at address 0x0.

How do I view an ELF file?

you can use readelf and objdump to read parts of an elf file. You can also use 'hexdump filename' to get a hexdump of the contents of a binary file (this is likely only useful if you like reading machine code or you are writing an assembler).


2 Answers

The void * pointer returned by dlopen(0, RTLD_LAZY) gives you a struct link_map *, that corresponds to the main executable.

Calling dl_iterate_phdr also returns the entry for the main executable on the very first execution of callback.

You are likely confused by the fact that .l_addr == 0 in the link map, and that dlpi_addr == 0 when using dl_iterate_phdr.

This is happening, because l_addr (and dlpi_addr) don't actually record the load address of an ELF image. Rather, they record the relocation that has been applied to that image.

Usually the main executable is built to load at 0x400000 (for x86_64 Linux) or at 0x08048000 (for ix86 Linux), and are loaded at that same address (i.e. they are not relocated).

But if you link your executable with -pie flag, then it will be linked-at 0x0, and it will be relocated to some other address.

So how do you get to the ELF header? Easy:

#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif

#include <link.h>
#include <stdio.h>
#include <stdlib.h>

static int
callback(struct dl_phdr_info *info, size_t size, void *data)
{
  int j;
  static int once = 0;

  if (once) return 0;
  once = 1;

  printf("relocation: 0x%lx\n", (long)info->dlpi_addr);

  for (j = 0; j < info->dlpi_phnum; j++) {
    if (info->dlpi_phdr[j].p_type == PT_LOAD) {
      printf("a.out loaded at %p\n",
             (void *) (info->dlpi_addr + info->dlpi_phdr[j].p_vaddr));
      break;
    }
  }
  return 0;
}

int
main(int argc, char *argv[])
{
  dl_iterate_phdr(callback, NULL);
  exit(EXIT_SUCCESS);
}


$ gcc -m32 t.c && ./a.out
relocation: 0x0
a.out loaded at 0x8048000

$ gcc -m64 t.c && ./a.out
relocation: 0x0
a.out loaded at 0x400000

$ gcc -m32 -pie -fPIC t.c && ./a.out
relocation: 0xf7789000
a.out loaded at 0xf7789000

$ gcc -m64 -pie -fPIC t.c && ./a.out
relocation: 0x7f3824964000
a.out loaded at 0x7f3824964000

Update:

Why does the man page say "base address" and not relocation?

It's a bug ;-)

I am guessing that the man page was written long before prelink and pie, and ASLR existed. Without prelink, shared libraries are always linked to load at address 0x0, and then relocation and base address become one and the same.

how come dlpi_name points to an empty string when info refers to the main executable?

It's an accident of implementation.

The way this works, is that the kernel open(2)s the executable and passes the open file descriptor to the loader (in the auxv[] vector, as AT_EXECFD). Everything the loader knows about the executable it gets by reading that file descriptor.

There is no easy way on UNIX to map a file descriptor back to the name it was opened as. For one thing, UNIX supports hard-links, and there could be multiple filenames that refer to the same file.

Newer Linux kernels also pass in the name that was used to execve(2) the executable (also in auxv[], as AT_EXECFN). But that is optional, and even when it is passed in, glibc doesn't put it into .l_name / dlpi_name in order to not break existing programs which became dependent on the name being empty.

Instead, glibc saves that name in __progname and __progname_full.

The loader coud readlink(2) the name from /proc/self/exe on systems that didn't use AT_EXECFN, but the /proc file system is not guaranteed to be mounted either, so that would still leave it with an empty name sometimes.

like image 187
Employed Russian Avatar answered Oct 01 '22 17:10

Employed Russian


There is the glibc dl_iterate_phdr() function. I'm not sure it gives you exactly what you want, but that is as close as I know:

"The dl_iterate_phdr() function allows an application to inquire at run time to find out which shared objects it has loaded." http://linux.die.net/man/3/dl_iterate_phdr

like image 21
gby Avatar answered Oct 01 '22 18:10

gby