Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I merge two binary executables?

Tags:

c

linux

linker

elf

bfd

This question follows on from another question I asked before. In short, this is one of my attempts at merging two fully linked executables into a single fully linked executable. The difference is that the previous question deals with merging an object file to a full linked executable which is even harder because it means I need to manually deal with relocations.

What I have are the following files:

example-target.c:

#include <stdlib.h>
#include <stdio.h>

int main(void)
{
    puts("1234");
    return EXIT_SUCCESS;
}

example-embed.c:

#include <stdlib.h>
#include <stdio.h>

/*
 * Fake main. Never used, just there so we can perform a full link.
 */
int main(void)
{
    return EXIT_SUCCESS;
}

void func1(void)
{
    puts("asdf");
}

My goal is to merge these two executables to produce a final executable which is the same as example-target, but additionally has another main and func1.

From the point of view of the BFD library, each binary is composed (amongst other things) of a set of sections. One of the first problems I faced was that these sections had conflicting load addresses (such that if I was to merge them, the sections would overlap).

What I did to solve this was to analyse example-target programmatically to get a list of the load address and sizes of each of its sections. I then did the same for example-embed and used this information to dynamically generate a linker command for example-embed.c which ensures that all of its sections are linked at addresses that do not overlap with any of the sections in example-target. Hence example-embed is actually fully linked twice in this process: once to determine how many sections and what sizes they are, and once again to link with a guarantee that there are no section clashes with example-target.

On my system, the linker command produced is:

-Wl,--section-start=.new.interp=0x1004238,--section-start=.new.note.ABI-tag=0x1004254,
--section-start=.new.note.gnu.build-id=0x1004274,--section-start=.new.gnu.hash=0x1004298,
--section-start=.new.dynsym=0x10042B8,--section-start=.new.dynstr=0x1004318,
--section-start=.new.gnu.version=0x1004356,--section-start=.new.gnu.version_r=0x1004360,
--section-start=.new.rela.dyn=0x1004380,--section-start=.new.rela.plt=0x1004398,
--section-start=.new.init=0x10043C8,--section-start=.new.plt=0x10043E0,
--section-start=.new.text=0x1004410,--section-start=.new.fini=0x10045E8,
--section-start=.new.rodata=0x10045F8,--section-start=.new.eh_frame_hdr=0x1004604,
--section-start=.new.eh_frame=0x1004638,--section-start=.new.ctors=0x1204E28,
--section-start=.new.dtors=0x1204E38,--section-start=.new.jcr=0x1204E48,
--section-start=.new.dynamic=0x1204E50,--section-start=.new.got=0x1204FE0,
--section-start=.new.got.plt=0x1204FE8,--section-start=.new.data=0x1205010,
--section-start=.new.bss=0x1205020,--section-start=.new.comment=0xC04000

(Note that I prefixed section names with .new using objcopy --prefix-sections=.new example-embedobj to avoid section name clashes.)

I then wrote some code to generate a new executable (borrowed some code both from objcopy and Security Warrior book). The new executable should have:

  • All the sections of example-target and all the sections of example-embed
  • A symbol table which contains all the symbols from example-target and all the symbols of example-embed

The code I wrote is:

#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#include <bfd.h>
#include <libiberty.h>

struct COPYSECTION_DATA {
    bfd *      obfd;
    asymbol ** syms;
    int        symsize;
    int        symcount;
};

void copy_section(bfd * ibfd, asection * section, PTR data)
{
    struct COPYSECTION_DATA * csd  = data;
    bfd *             obfd = csd->obfd;
    asection *        s;
    long              size, count, sz_reloc;

    if((bfd_get_section_flags(ibfd, section) & SEC_GROUP) != 0) {
        return;
    }

    /* get output section from input section struct */
    s        = section->output_section;
    /* get sizes for copy */
    size     = bfd_get_section_size(section);
    sz_reloc = bfd_get_reloc_upper_bound(ibfd, section);

    if(!sz_reloc) {
        /* no relocations */
        bfd_set_reloc(obfd, s, NULL, 0);
    } else if(sz_reloc > 0) {
        arelent ** buf;

        /* build relocations */
        buf   = xmalloc(sz_reloc);
        count = bfd_canonicalize_reloc(ibfd, section, buf, csd->syms);
        /* set relocations for the output section */
        bfd_set_reloc(obfd, s, count ? buf : NULL, count);
        free(buf);
    }

    /* get input section contents, set output section contents */
    if(section->flags & SEC_HAS_CONTENTS) {
        bfd_byte * memhunk = NULL;
        bfd_get_full_section_contents(ibfd, section, &memhunk);
        bfd_set_section_contents(obfd, s, memhunk, 0, size);
        free(memhunk);
    }
}

void define_section(bfd * ibfd, asection * section, PTR data)
{
    bfd *      obfd = data;
    asection * s    = bfd_make_section_anyway_with_flags(obfd,
            section->name, bfd_get_section_flags(ibfd, section));
    /* set size to same as ibfd section */
    bfd_set_section_size(obfd, s, bfd_section_size(ibfd, section));

    /* set vma */
    bfd_set_section_vma(obfd, s, bfd_section_vma(ibfd, section));
    /* set load address */
    s->lma = section->lma;
    /* set alignment -- the power 2 will be raised to */
    bfd_set_section_alignment(obfd, s,
            bfd_section_alignment(ibfd, section));
    s->alignment_power = section->alignment_power;
    /* link the output section to the input section */
    section->output_section = s;
    section->output_offset  = 0;

    /* copy merge entity size */
    s->entsize = section->entsize;

    /* copy private BFD data from ibfd section to obfd section */
    bfd_copy_private_section_data(ibfd, section, obfd, s);
}

void merge_symtable(bfd * ibfd, bfd * embedbfd, bfd * obfd,
        struct COPYSECTION_DATA * csd)
{
    /* set obfd */
    csd->obfd     = obfd;

    /* get required size for both symbol tables and allocate memory */
    csd->symsize  = bfd_get_symtab_upper_bound(ibfd) /********+
            bfd_get_symtab_upper_bound(embedbfd) */;
    csd->syms     = xmalloc(csd->symsize);

    csd->symcount =  bfd_canonicalize_symtab (ibfd, csd->syms);
    /******** csd->symcount += bfd_canonicalize_symtab (embedbfd,
            csd->syms + csd->symcount); */

    /* copy merged symbol table to obfd */
    bfd_set_symtab(obfd, csd->syms, csd->symcount);
}

bool merge_object(bfd * ibfd, bfd * embedbfd, bfd * obfd)
{
    struct COPYSECTION_DATA csd = {0};

    if(!ibfd || !embedbfd || !obfd) {
        return FALSE;
    }

    /* set output parameters to ibfd settings */
    bfd_set_format(obfd, bfd_get_format(ibfd));
    bfd_set_arch_mach(obfd, bfd_get_arch(ibfd), bfd_get_mach(ibfd));
    bfd_set_file_flags(obfd, bfd_get_file_flags(ibfd) &
            bfd_applicable_file_flags(obfd));

    /* set the entry point of obfd */
    bfd_set_start_address(obfd, bfd_get_start_address(ibfd));

    /* define sections for output file */
    bfd_map_over_sections(ibfd, define_section, obfd);
    /******** bfd_map_over_sections(embedbfd, define_section, obfd); */

    /* merge private data into obfd */
    bfd_merge_private_bfd_data(ibfd, obfd);
    /******** bfd_merge_private_bfd_data(embedbfd, obfd); */

    merge_symtable(ibfd, embedbfd, obfd, &csd);

    bfd_map_over_sections(ibfd, copy_section, &csd);
    /******** bfd_map_over_sections(embedbfd, copy_section, &csd); */

    free(csd.syms);
    return TRUE;
}

int main(int argc, char **argv)
{
    bfd * ibfd;
    bfd * embedbfd;
    bfd * obfd;

    if(argc != 4) {
        perror("Usage: infile embedfile outfile\n");
        xexit(-1);
    }

    bfd_init();
    ibfd     = bfd_openr(argv[1], NULL);
    embedbfd = bfd_openr(argv[2], NULL);

    if(ibfd == NULL || embedbfd == NULL) {
        perror("asdfasdf");
        xexit(-1);
    }

    if(!bfd_check_format(ibfd, bfd_object) ||
            !bfd_check_format(embedbfd, bfd_object)) {
        perror("File format error");
        xexit(-1);
    }

    obfd = bfd_openw(argv[3], NULL);
    bfd_set_format(obfd, bfd_object);

    if(!(merge_object(ibfd, embedbfd, obfd))) {
        perror("Error merging input/obj");
        xexit(-1);
    }

    bfd_close(ibfd);
    bfd_close(embedbfd);
    bfd_close(obfd);
    return EXIT_SUCCESS;
}

To summarise what this code does, it takes 2 input files (ibfd and embedbfd) to generate an output file (obfd).

  • Copies format/arch/mach/file flags and start address from ibfd to obfd
  • Defines sections from both ibfd and embedbfd to obfd. Population of the sections happens separately because BFD mandates that all sections are created before any start to be populated.
  • Merge private data of both input BFDs to the output BFD. Since BFD is a common abstraction above many file formats, it is not necessarily able to comprehensively encapsulate everything required by the underlying file format.
  • Create a combined symbol table consisting of the symbol table of ibfd and embedbfd and set this as the symbol table of obfd. This symbol table is saved so it can later be used to build relocation information.
  • Copy the sections from ibfd to obfd. As well as copying the section contents, this step also deals with building and setting the relocation table.

In the code above, some lines are commented out with /******** */. These lines deal with the merging of example-embed. If they are commented out, what happens is that obfd is simply built as a copy of ibfd. I have tested this and it works fine. However, once I comment these lines back in the problems start occurring.

With the uncommented version which does the full merge, it still generates an output file. This output file can be inspected with objdump and found to have all the sections, code and symbol tables of both inputs. However, objdump complains with:

BFD: BFD (GNU Binutils for Ubuntu) 2.21.53.20110810 assertion fail ../../bfd/elf.c:1708
BFD: BFD (GNU Binutils for Ubuntu) 2.21.53.20110810 assertion fail ../../bfd/elf.c:1708

On my system, 1708 of elf.c is:

BFD_ASSERT (elf_dynsymtab (abfd) == 0);

elf_dynsymtab is a macro in elf-bfd.h for:

#define elf_dynsymtab(bfd)  (elf_tdata(bfd) -> dynsymtab_section)

I'm not familiar with the ELF layer, but I believe this is a problem reading the dynamic symbol table (or perhaps saying it's not present). For the time, I am trying to avoid having to reach down directly into the ELF layer unless necessary. Is anyone able to tell me what I'm doing wrong either in my code or conceptually?

If it is helpful, I can also post the code for the linker command generation or compiled versions of the example binaries.


I realise that this is a very large question and for this reason, I would like to properly reward anyone who is able to help me with it. If I am able to solve this with the help of someone, I am happy to award a 500+ bonus.

like image 327
Mike Kwan Avatar asked Mar 15 '12 14:03

Mike Kwan


1 Answers

Why do all of this manually? Given that you have all symbol information (which you must if you want to edit the binary in a sane way), wouldn't it be easier to SPLIT the executable into separate object files (say, one object file per function), do your editing, and relink it?

like image 66
zvrba Avatar answered Sep 28 '22 05:09

zvrba