This question follows on from another question I asked before. In short, this is one of my attempts at merging two fully linked executables into a single fully linked executable. The difference is that the previous question deals with merging an object file to a full linked executable which is even harder because it means I need to manually deal with relocations.
What I have are the following files:
example-target.c
:
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
puts("1234");
return EXIT_SUCCESS;
}
example-embed.c
:
#include <stdlib.h>
#include <stdio.h>
/*
* Fake main. Never used, just there so we can perform a full link.
*/
int main(void)
{
return EXIT_SUCCESS;
}
void func1(void)
{
puts("asdf");
}
My goal is to merge these two executables to produce a final executable which is the same as example-target
, but additionally has another main
and func1
.
From the point of view of the BFD library, each binary is composed (amongst other things) of a set of sections. One of the first problems I faced was that these sections had conflicting load addresses (such that if I was to merge them, the sections would overlap).
What I did to solve this was to analyse example-target
programmatically to get a list of the load address and sizes of each of its sections. I then did the same for example-embed
and used this information to dynamically generate a linker command for example-embed.c
which ensures that all of its sections are linked at addresses that do not overlap with any of the sections in example-target
. Hence example-embed
is actually fully linked twice in this process: once to determine how many sections and what sizes they are, and once again to link with a guarantee that there are no section clashes with example-target
.
On my system, the linker command produced is:
-Wl,--section-start=.new.interp=0x1004238,--section-start=.new.note.ABI-tag=0x1004254,
--section-start=.new.note.gnu.build-id=0x1004274,--section-start=.new.gnu.hash=0x1004298,
--section-start=.new.dynsym=0x10042B8,--section-start=.new.dynstr=0x1004318,
--section-start=.new.gnu.version=0x1004356,--section-start=.new.gnu.version_r=0x1004360,
--section-start=.new.rela.dyn=0x1004380,--section-start=.new.rela.plt=0x1004398,
--section-start=.new.init=0x10043C8,--section-start=.new.plt=0x10043E0,
--section-start=.new.text=0x1004410,--section-start=.new.fini=0x10045E8,
--section-start=.new.rodata=0x10045F8,--section-start=.new.eh_frame_hdr=0x1004604,
--section-start=.new.eh_frame=0x1004638,--section-start=.new.ctors=0x1204E28,
--section-start=.new.dtors=0x1204E38,--section-start=.new.jcr=0x1204E48,
--section-start=.new.dynamic=0x1204E50,--section-start=.new.got=0x1204FE0,
--section-start=.new.got.plt=0x1204FE8,--section-start=.new.data=0x1205010,
--section-start=.new.bss=0x1205020,--section-start=.new.comment=0xC04000
(Note that I prefixed section names with .new
using objcopy --prefix-sections=.new example-embedobj
to avoid section name clashes.)
I then wrote some code to generate a new executable (borrowed some code both from objcopy
and Security Warrior
book). The new executable should have:
example-target
and all the sections of example-embed
example-target
and all the symbols of example-embed
The code I wrote is:
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#include <bfd.h>
#include <libiberty.h>
struct COPYSECTION_DATA {
bfd * obfd;
asymbol ** syms;
int symsize;
int symcount;
};
void copy_section(bfd * ibfd, asection * section, PTR data)
{
struct COPYSECTION_DATA * csd = data;
bfd * obfd = csd->obfd;
asection * s;
long size, count, sz_reloc;
if((bfd_get_section_flags(ibfd, section) & SEC_GROUP) != 0) {
return;
}
/* get output section from input section struct */
s = section->output_section;
/* get sizes for copy */
size = bfd_get_section_size(section);
sz_reloc = bfd_get_reloc_upper_bound(ibfd, section);
if(!sz_reloc) {
/* no relocations */
bfd_set_reloc(obfd, s, NULL, 0);
} else if(sz_reloc > 0) {
arelent ** buf;
/* build relocations */
buf = xmalloc(sz_reloc);
count = bfd_canonicalize_reloc(ibfd, section, buf, csd->syms);
/* set relocations for the output section */
bfd_set_reloc(obfd, s, count ? buf : NULL, count);
free(buf);
}
/* get input section contents, set output section contents */
if(section->flags & SEC_HAS_CONTENTS) {
bfd_byte * memhunk = NULL;
bfd_get_full_section_contents(ibfd, section, &memhunk);
bfd_set_section_contents(obfd, s, memhunk, 0, size);
free(memhunk);
}
}
void define_section(bfd * ibfd, asection * section, PTR data)
{
bfd * obfd = data;
asection * s = bfd_make_section_anyway_with_flags(obfd,
section->name, bfd_get_section_flags(ibfd, section));
/* set size to same as ibfd section */
bfd_set_section_size(obfd, s, bfd_section_size(ibfd, section));
/* set vma */
bfd_set_section_vma(obfd, s, bfd_section_vma(ibfd, section));
/* set load address */
s->lma = section->lma;
/* set alignment -- the power 2 will be raised to */
bfd_set_section_alignment(obfd, s,
bfd_section_alignment(ibfd, section));
s->alignment_power = section->alignment_power;
/* link the output section to the input section */
section->output_section = s;
section->output_offset = 0;
/* copy merge entity size */
s->entsize = section->entsize;
/* copy private BFD data from ibfd section to obfd section */
bfd_copy_private_section_data(ibfd, section, obfd, s);
}
void merge_symtable(bfd * ibfd, bfd * embedbfd, bfd * obfd,
struct COPYSECTION_DATA * csd)
{
/* set obfd */
csd->obfd = obfd;
/* get required size for both symbol tables and allocate memory */
csd->symsize = bfd_get_symtab_upper_bound(ibfd) /********+
bfd_get_symtab_upper_bound(embedbfd) */;
csd->syms = xmalloc(csd->symsize);
csd->symcount = bfd_canonicalize_symtab (ibfd, csd->syms);
/******** csd->symcount += bfd_canonicalize_symtab (embedbfd,
csd->syms + csd->symcount); */
/* copy merged symbol table to obfd */
bfd_set_symtab(obfd, csd->syms, csd->symcount);
}
bool merge_object(bfd * ibfd, bfd * embedbfd, bfd * obfd)
{
struct COPYSECTION_DATA csd = {0};
if(!ibfd || !embedbfd || !obfd) {
return FALSE;
}
/* set output parameters to ibfd settings */
bfd_set_format(obfd, bfd_get_format(ibfd));
bfd_set_arch_mach(obfd, bfd_get_arch(ibfd), bfd_get_mach(ibfd));
bfd_set_file_flags(obfd, bfd_get_file_flags(ibfd) &
bfd_applicable_file_flags(obfd));
/* set the entry point of obfd */
bfd_set_start_address(obfd, bfd_get_start_address(ibfd));
/* define sections for output file */
bfd_map_over_sections(ibfd, define_section, obfd);
/******** bfd_map_over_sections(embedbfd, define_section, obfd); */
/* merge private data into obfd */
bfd_merge_private_bfd_data(ibfd, obfd);
/******** bfd_merge_private_bfd_data(embedbfd, obfd); */
merge_symtable(ibfd, embedbfd, obfd, &csd);
bfd_map_over_sections(ibfd, copy_section, &csd);
/******** bfd_map_over_sections(embedbfd, copy_section, &csd); */
free(csd.syms);
return TRUE;
}
int main(int argc, char **argv)
{
bfd * ibfd;
bfd * embedbfd;
bfd * obfd;
if(argc != 4) {
perror("Usage: infile embedfile outfile\n");
xexit(-1);
}
bfd_init();
ibfd = bfd_openr(argv[1], NULL);
embedbfd = bfd_openr(argv[2], NULL);
if(ibfd == NULL || embedbfd == NULL) {
perror("asdfasdf");
xexit(-1);
}
if(!bfd_check_format(ibfd, bfd_object) ||
!bfd_check_format(embedbfd, bfd_object)) {
perror("File format error");
xexit(-1);
}
obfd = bfd_openw(argv[3], NULL);
bfd_set_format(obfd, bfd_object);
if(!(merge_object(ibfd, embedbfd, obfd))) {
perror("Error merging input/obj");
xexit(-1);
}
bfd_close(ibfd);
bfd_close(embedbfd);
bfd_close(obfd);
return EXIT_SUCCESS;
}
To summarise what this code does, it takes 2 input files (ibfd
and embedbfd
) to generate an output file (obfd
).
ibfd
to obfd
ibfd
and embedbfd
to obfd
. Population of the sections happens separately because BFD mandates that all sections are created before any start to be populated.ibfd
and embedbfd
and set this as the symbol table of obfd
. This symbol table is saved so it can later be used to build relocation information.ibfd
to obfd
. As well as copying the section contents, this step also deals with building and setting the relocation table.In the code above, some lines are commented out with /******** */
. These lines deal with the merging of example-embed
. If they are commented out, what happens is that obfd
is simply built as a copy of ibfd
. I have tested this and it works fine. However, once I comment these lines back in the problems start occurring.
With the uncommented version which does the full merge, it still generates an output file. This output file can be inspected with objdump
and found to have all the sections, code and symbol tables of both inputs. However, objdump
complains with:
BFD: BFD (GNU Binutils for Ubuntu) 2.21.53.20110810 assertion fail ../../bfd/elf.c:1708
BFD: BFD (GNU Binutils for Ubuntu) 2.21.53.20110810 assertion fail ../../bfd/elf.c:1708
On my system, 1708
of elf.c
is:
BFD_ASSERT (elf_dynsymtab (abfd) == 0);
elf_dynsymtab
is a macro in elf-bfd.h
for:
#define elf_dynsymtab(bfd) (elf_tdata(bfd) -> dynsymtab_section)
I'm not familiar with the ELF layer, but I believe this is a problem reading the dynamic symbol table (or perhaps saying it's not present). For the time, I am trying to avoid having to reach down directly into the ELF layer unless necessary. Is anyone able to tell me what I'm doing wrong either in my code or conceptually?
If it is helpful, I can also post the code for the linker command generation or compiled versions of the example binaries.
I realise that this is a very large question and for this reason, I would like to properly reward anyone who is able to help me with it. If I am able to solve this with the help of someone, I am happy to award a 500+ bonus.
Why do all of this manually? Given that you have all symbol information (which you must if you want to edit the binary in a sane way), wouldn't it be easier to SPLIT the executable into separate object files (say, one object file per function), do your editing, and relink it?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With